With their nearly completed "Whole Tale" project, Bertram Ludäscher, professor and director of the Center for Informatics Research in Science and Scholarship (CIRSS), and his team have created methods and tools for scientists to link executable computer code, data, and other information to online scholarly publications, which helps ensure reproducibility and paves the way for new discoveries.
Whole Tale aims to build trust in computer-generated results while addressing the problems introduced by evolving computational environments and encountered by individuals using varying computer systems when attempting to recreate results. The project enables researchers to share "re-runnable" representations of their research, thereby making the resulting publications "living articles" that allow other researchers to examine how computed results originally were obtained and seamlessly recreate those results.
"Any science that uses computers is affected by the computational reproducibility problems that Whole Tale aims to address," said Timothy McPhillips, senior research scientist at CIRSS.
Building on the Whole Tale concept, McPhillips and colleagues Craig Willis and Kacper Kowalik are working with Ludäscher on a new project that will create a certification for computer-generated results when it is not possible for others to recreate results either because the necessary data cannot be shared or specialized computing resources are needed to perform the computations. Examples of data that might be inaccessible are restricted census data or very large amounts of data gained from satellite streaming.
The goal of the new project, "TRAnsparency CErtified (TRACE): Trusting Computational Research Without Repeating It," is to certify the original execution of a computational workflow that produced findings or data products. The project will create tools that managers of computing centers can use to declare the dimensions of computational transparency supported by their platforms. These tools also will certify that a specific computational workflow was executed on the platform as well as bundle and certify for dissemination artifacts, records of their execution, and technical metadata about their contents.
Collaborating with the iSchool will be Lars Vilhuber, professor of economics at Cornell University and data editor for the journals of the American Economic Association, and Thu-Mai Christian, assistant director for archives at the Odum Institute for Research in Social Science at the University of North Carolina. Both are directly involved in the enforcement of journal policies for research transparency and reproducibility.
"A large number of studies in economics and political science rely on access to confidential or proprietary data that impede or prevent verification. Addressing this is a central goal of TRACE, and we believe that our approach will be broadly applicable to other fields," said Willis.
Part of the project will involve working with stakeholders from varying disciplines, including ecology, bioinformatics, and computer science, to determine what information must be included to earn certification status.
"We need to determine from the stakeholders what would give practitioners in their disciplines confidence that computations were performed as described in a publication," McPhillips said.
The team has been awarded a three-year, $349,999 grant from the National Science Foundation (NSF) for this research.