Access to big data is crucial for credibility of computational research findings, says Stodden

 Photo by George Dyson

stodden_victoria090527-cr-b_byGeorgeDyso Think of a scientist at work, and you might picture someone at a lab bench, doing a physical experiment involving beakers or petri dishes and recording his or her findings, which will eventually form the basis for a scientific paper.

That’s the old model of science, says University of Illinois professor of library and information science Victoria Stodden, who was recently interviewed by the University of Illinois News Bureau.

Science is being transformed so that massive computation is central to scientific experiments, with scientists using computer code to analyze huge amounts of data. Computational science might be used to study climate change, to simulate the formation of galaxies, for biomolecular modeling or for mining a vast set of data looking for patterns.

But, Stodden says, this relatively new form of scientific inquiry has not yet developed standards for communicating the details of how the work was done or for validating results. The lack of such standards is causing a credibility crisis, Stodden says. Her research looks at the “reproducibility” of computational science – how findings can be verified and an experiment replicated or used as a basis for further research.

In the traditional form of scientific experimentation, a scientist keeps records and provides information about the conditions in the lab and the materials and variable factors in the experiment. Another scientist can run the same experiment to verify the results, or alter it to answer a related research question of his or her own. Such inquiries are central to scientific principles of rooting out errors in process and mistakes in interpretation.

In order to do those things in computational science, others must have access to the data and computer code used, Stodden said. But there are not standards in place for sharing data and code.

“What if there is a mistake in the code? How do I find out if I can’t get to the code?” Stodden asked. “What does it mean to verify a (computer) simulation?”

She and a number of colleagues are advocating for open access to data and code. The problem is not a simple one, though. There are privacy issues involving human subject data, and proprietary issues where the research is the result of a partnership between a scientist and industry.

Then there are the technical issues of where to put software and data, who gets access to it and whether they would yield the same results as hardware and software systems are upgraded.

In numerous articles they’ve published in the last several years, Stodden and her colleagues have offered suggestions to scientists, journal editors and funding agencies for establishing standards to document the software and datasets used in published research results. Their suggestions for incentives to improve scientific integrity generally appeared online at sciencemag.org in late June.

Stodden was part of a group convened by the National Academies of Sciences last fall to look at how the research community can address instances where published research results (whether obtained through computational or more traditional methods of experimentation) cannot be reproduced. They wrote that the pressure to publish and the lengthening time it takes for postdoctoral fellows to obtain a faculty position and their first independent research grants are counterproductive to maintaining high standards of research integrity. They suggested incentives should be changed so researchers are rewarded for the quality and importance of their work, rather than the number of publications they produce.

Stodden said some scientific journals and funding agencies are already adopting open data and code policies for computational research.

The journals Nature and Science both require authors to make the data underlying their published results available upon request, and Science also requires access to computer codes involved in the creation or analysis of data. In 2011, the National Science Foundation began requiring grant applicants to include a data management plan, describing the availability and archiving of data produced by their research, as part of grant applications. And a 2003 report by the National Academies called for scientists to include data, algorithms and other information necessary to support the claims they make in reporting their findings, and for scientific journals to require sharing of software, algorithms and complex datasets.

“This will become standard, to share code and data,” Stodden predicted.

Tags:
Updated on
Backto the news archive

Related News

Schneider selected as 2024-2025 Harvard Radcliffe Institute Fellow

Associate Professor Jodi Schneider has been selected as a 2024-2025 fellow of the Harvard Radcliffe Institute, an institute of Harvard University that fosters interdisciplinary research across the humanities, sciences, social sciences, arts, and professions.

Jodi Schneider

iSchool researchers to present at ACM Web Conference

Members of Associate Professor Dong Wang's research group, the Social Sensing and Intelligence Lab, will present their research at the Web Conference 2024, which will be held from May 13-17 in Singapore. The Web Conference is the premier venue to present and discuss progress in research, development, standards, and applications of topics related to the Web.

iSchool researchers to present at CHI 2024

iSchool faculty and students will present their research at the ACM Conference on Human Factors in Computing Systems (CHI 2024), which will be held from May 11-16 in Honolulu, Hawaii. The conference, considered the most prestigious in the field of Human-Computer Interaction, attracts researchers and practitioners from around the globe. The theme for CHI 2024 is "Surfing the World."

CHI 2024

iSchool researchers present at inaugural ASIS&T symposium

iSchool researchers will present their work at the Association for Information Science & Technology (ASIS&T) Midwest Chapter Spring Symposium on April 26. The inaugural symposium will include talks by seventeen researchers from ten institutions across the Midwest region.

iSchool researchers present at iConference 2024

The following iSchool faculty and students participated in the virtual portion of iConference 2024 from April 15-18. The in-person portion of the conference will be held in Changchun, China, from April 22-26. The theme of this year’s conference is "Wisdom, Well-being, Win-win."