Access to big data is crucial for credibility of computational research findings, says Stodden

 Photo by George Dyson

stodden_victoria090527-cr-b_byGeorgeDyso Think of a scientist at work, and you might picture someone at a lab bench, doing a physical experiment involving beakers or petri dishes and recording his or her findings, which will eventually form the basis for a scientific paper.

That’s the old model of science, says University of Illinois professor of library and information science Victoria Stodden, who was recently interviewed by the University of Illinois News Bureau.

Science is being transformed so that massive computation is central to scientific experiments, with scientists using computer code to analyze huge amounts of data. Computational science might be used to study climate change, to simulate the formation of galaxies, for biomolecular modeling or for mining a vast set of data looking for patterns.

But, Stodden says, this relatively new form of scientific inquiry has not yet developed standards for communicating the details of how the work was done or for validating results. The lack of such standards is causing a credibility crisis, Stodden says. Her research looks at the “reproducibility” of computational science – how findings can be verified and an experiment replicated or used as a basis for further research.

In the traditional form of scientific experimentation, a scientist keeps records and provides information about the conditions in the lab and the materials and variable factors in the experiment. Another scientist can run the same experiment to verify the results, or alter it to answer a related research question of his or her own. Such inquiries are central to scientific principles of rooting out errors in process and mistakes in interpretation.

In order to do those things in computational science, others must have access to the data and computer code used, Stodden said. But there are not standards in place for sharing data and code.

“What if there is a mistake in the code? How do I find out if I can’t get to the code?” Stodden asked. “What does it mean to verify a (computer) simulation?”

She and a number of colleagues are advocating for open access to data and code. The problem is not a simple one, though. There are privacy issues involving human subject data, and proprietary issues where the research is the result of a partnership between a scientist and industry.

Then there are the technical issues of where to put software and data, who gets access to it and whether they would yield the same results as hardware and software systems are upgraded.

In numerous articles they’ve published in the last several years, Stodden and her colleagues have offered suggestions to scientists, journal editors and funding agencies for establishing standards to document the software and datasets used in published research results. Their suggestions for incentives to improve scientific integrity generally appeared online at sciencemag.org in late June.

Stodden was part of a group convened by the National Academies of Sciences last fall to look at how the research community can address instances where published research results (whether obtained through computational or more traditional methods of experimentation) cannot be reproduced. They wrote that the pressure to publish and the lengthening time it takes for postdoctoral fellows to obtain a faculty position and their first independent research grants are counterproductive to maintaining high standards of research integrity. They suggested incentives should be changed so researchers are rewarded for the quality and importance of their work, rather than the number of publications they produce.

Stodden said some scientific journals and funding agencies are already adopting open data and code policies for computational research.

The journals Nature and Science both require authors to make the data underlying their published results available upon request, and Science also requires access to computer codes involved in the creation or analysis of data. In 2011, the National Science Foundation began requiring grant applicants to include a data management plan, describing the availability and archiving of data produced by their research, as part of grant applications. And a 2003 report by the National Academies called for scientists to include data, algorithms and other information necessary to support the claims they make in reporting their findings, and for scientific journals to require sharing of software, algorithms and complex datasets.

“This will become standard, to share code and data,” Stodden predicted.

Tags:
Updated on
Backto the news archive

Related News

iSchool researchers present at inaugural ASIS&T symposium

iSchool researchers will present their work at the Association for Information Science & Technology (ASIS&T) Midwest Chapter Spring Symposium on April 26. The inaugural symposium will include talks by seventeen researchers from ten institutions across the Midwest region.

iSchool researchers present at iConference 2024

The following iSchool faculty and students participated in the virtual portion of iConference 2024 from April 15-18. The in-person portion of the conference will be held in Changchun, China, from April 22-26. The theme of this year’s conference is "Wisdom, Well-being, Win-win."

Trainor receives the Karen Wold Level the Learning Field Award

Senior Lecturer Kevin Trainor has been selected by the Division of Disability Resources and Educational Services (DRES) to receive the 2024 Karen Wold Level the Learning Field Award. This award honors exemplary members of faculty and staff for advocating and/or implementing instructional strategies, technologies, and disability-related accommodations that afford students with disabilities equal access to academic resources and curricula. 

Kevin Trainor

Seo coauthors chapter on data science and accessibility

Assistant Professor JooYoung Seo and Mine Dogucu, professor of statistics in the Donald Bren School of Information and Computer Sciences at the University of California Irvine, have coauthored a chapter in the new book Teaching Accessible Computing. The goal of the book, which is edited by Alannah Oleson, Amy J. Ko and Richard Ladner, is to help educators feel confident in introducing topics related to disability and accessible computing and integrating accessibility into their courses.

JooYoung Seo

iSchool instructors ranked as excellent

Fifty-five iSchool instructors were named in the University's List of Teachers Ranked as Excellent for Fall 2023. The rankings are released every semester, and results are based on the Instructor and Course Evaluation System (ICES) questionnaire forms maintained by Measurement and Evaluation in the Center for Innovation in Teaching and Learning. 

iSchool Building