Access to big data is crucial for credibility of computational research findings, says Stodden

 Photo by George Dyson

stodden_victoria090527-cr-b_byGeorgeDyso Think of a scientist at work, and you might picture someone at a lab bench, doing a physical experiment involving beakers or petri dishes and recording his or her findings, which will eventually form the basis for a scientific paper.

That’s the old model of science, says University of Illinois professor of library and information science Victoria Stodden, who was recently interviewed by the University of Illinois News Bureau.

Science is being transformed so that massive computation is central to scientific experiments, with scientists using computer code to analyze huge amounts of data. Computational science might be used to study climate change, to simulate the formation of galaxies, for biomolecular modeling or for mining a vast set of data looking for patterns.

But, Stodden says, this relatively new form of scientific inquiry has not yet developed standards for communicating the details of how the work was done or for validating results. The lack of such standards is causing a credibility crisis, Stodden says. Her research looks at the “reproducibility” of computational science – how findings can be verified and an experiment replicated or used as a basis for further research.

In the traditional form of scientific experimentation, a scientist keeps records and provides information about the conditions in the lab and the materials and variable factors in the experiment. Another scientist can run the same experiment to verify the results, or alter it to answer a related research question of his or her own. Such inquiries are central to scientific principles of rooting out errors in process and mistakes in interpretation.

In order to do those things in computational science, others must have access to the data and computer code used, Stodden said. But there are not standards in place for sharing data and code.

“What if there is a mistake in the code? How do I find out if I can’t get to the code?” Stodden asked. “What does it mean to verify a (computer) simulation?”

She and a number of colleagues are advocating for open access to data and code. The problem is not a simple one, though. There are privacy issues involving human subject data, and proprietary issues where the research is the result of a partnership between a scientist and industry.

Then there are the technical issues of where to put software and data, who gets access to it and whether they would yield the same results as hardware and software systems are upgraded.

In numerous articles they’ve published in the last several years, Stodden and her colleagues have offered suggestions to scientists, journal editors and funding agencies for establishing standards to document the software and datasets used in published research results. Their suggestions for incentives to improve scientific integrity generally appeared online at sciencemag.org in late June.

Stodden was part of a group convened by the National Academies of Sciences last fall to look at how the research community can address instances where published research results (whether obtained through computational or more traditional methods of experimentation) cannot be reproduced. They wrote that the pressure to publish and the lengthening time it takes for postdoctoral fellows to obtain a faculty position and their first independent research grants are counterproductive to maintaining high standards of research integrity. They suggested incentives should be changed so researchers are rewarded for the quality and importance of their work, rather than the number of publications they produce.

Stodden said some scientific journals and funding agencies are already adopting open data and code policies for computational research.

The journals Nature and Science both require authors to make the data underlying their published results available upon request, and Science also requires access to computer codes involved in the creation or analysis of data. In 2011, the National Science Foundation began requiring grant applicants to include a data management plan, describing the availability and archiving of data produced by their research, as part of grant applications. And a 2003 report by the National Academies called for scientists to include data, algorithms and other information necessary to support the claims they make in reporting their findings, and for scientific journals to require sharing of software, algorithms and complex datasets.

“This will become standard, to share code and data,” Stodden predicted.

Tags:
Updated on
Backto the news archive

Related News

Student says ‘thank you’ with a helicopter ride

Last month, Michael Ferrer showed his appreciation for one of his MSIM instructors in a unique way—by inviting him for an insider’s look at his work as a reservist in the Illinois Army National Guard. For the ILARNG BOSS Lift, which took place on June 18 at Camp Atterbury, Indiana, Ferrer selected Michael Wonderlich, iSchool adjunct lecturer and senior associate director of business intelligence and enterprise architecture for Administrative Information Technology Services (AITS) at the University of Illinois.

Michael Wonderlich and Michael Ferrer hold a U of I flag in front of a military helicopter

Project helps librarians use data storytelling to advocate for public libraries

A toolkit for public librarians can help them use data to communicate the value of their services and justify their funding needs. The Data Storytelling for Librarians Toolkit helps librarians present data in story form using narrative strategies. It was developed by University of Illinois Urbana-Champaign information sciences professors.

Kate McDowell

Chan to deliver keynote at SIGCIS 2024

Associate Professor Anita Say Chan will deliver the keynote at the 15th annual conference of the SHOT (Society for the History of Technology) Special Interest Group for Computing, Information, and Society (SIGCIS), which will be held on July 14 in Viña del Mar, Chile. SIGCIS is the leading international group for historians with an interest in the history of information technology and its applications. The theme for SIGCIS 2024 is "System Update: Patches, Tactics, Responses."

Anita Say Chan

Mattson receives ISTE Making It Happen Award

Adjunct Lecturer Kristen Mattson has received the 2024 International Society for Technology in Education (ISTE) Making It Happen Award. The award honors educators and leaders who demonstrate outstanding commitment, leadership, courage, and persistence in improving digital learning opportunities for students.

Kristen Mattson

NISO publishes Recommended Practice on retracted science

The National Information Standards Organization (NISO) has announced the publication of the Communication of Retractions, Removals, and Expressions of Concern (CREC) Recommended Practice (NISO RP-45-2024), which is the product of a working group made up of cross-industry stakeholders, including Associate Professor Jodi Schneider. 

Jodi Schneider