Access to big data is crucial for credibility of computational research findings, says Stodden

 Photo by George Dyson

stodden_victoria090527-cr-b_byGeorgeDyso Think of a scientist at work, and you might picture someone at a lab bench, doing a physical experiment involving beakers or petri dishes and recording his or her findings, which will eventually form the basis for a scientific paper.

That’s the old model of science, says University of Illinois professor of library and information science Victoria Stodden, who was recently interviewed by the University of Illinois News Bureau.

Science is being transformed so that massive computation is central to scientific experiments, with scientists using computer code to analyze huge amounts of data. Computational science might be used to study climate change, to simulate the formation of galaxies, for biomolecular modeling or for mining a vast set of data looking for patterns.

But, Stodden says, this relatively new form of scientific inquiry has not yet developed standards for communicating the details of how the work was done or for validating results. The lack of such standards is causing a credibility crisis, Stodden says. Her research looks at the “reproducibility” of computational science – how findings can be verified and an experiment replicated or used as a basis for further research.

In the traditional form of scientific experimentation, a scientist keeps records and provides information about the conditions in the lab and the materials and variable factors in the experiment. Another scientist can run the same experiment to verify the results, or alter it to answer a related research question of his or her own. Such inquiries are central to scientific principles of rooting out errors in process and mistakes in interpretation.

In order to do those things in computational science, others must have access to the data and computer code used, Stodden said. But there are not standards in place for sharing data and code.

“What if there is a mistake in the code? How do I find out if I can’t get to the code?” Stodden asked. “What does it mean to verify a (computer) simulation?”

She and a number of colleagues are advocating for open access to data and code. The problem is not a simple one, though. There are privacy issues involving human subject data, and proprietary issues where the research is the result of a partnership between a scientist and industry.

Then there are the technical issues of where to put software and data, who gets access to it and whether they would yield the same results as hardware and software systems are upgraded.

In numerous articles they’ve published in the last several years, Stodden and her colleagues have offered suggestions to scientists, journal editors and funding agencies for establishing standards to document the software and datasets used in published research results. Their suggestions for incentives to improve scientific integrity generally appeared online at sciencemag.org in late June.

Stodden was part of a group convened by the National Academies of Sciences last fall to look at how the research community can address instances where published research results (whether obtained through computational or more traditional methods of experimentation) cannot be reproduced. They wrote that the pressure to publish and the lengthening time it takes for postdoctoral fellows to obtain a faculty position and their first independent research grants are counterproductive to maintaining high standards of research integrity. They suggested incentives should be changed so researchers are rewarded for the quality and importance of their work, rather than the number of publications they produce.

Stodden said some scientific journals and funding agencies are already adopting open data and code policies for computational research.

The journals Nature and Science both require authors to make the data underlying their published results available upon request, and Science also requires access to computer codes involved in the creation or analysis of data. In 2011, the National Science Foundation began requiring grant applicants to include a data management plan, describing the availability and archiving of data produced by their research, as part of grant applications. And a 2003 report by the National Academies called for scientists to include data, algorithms and other information necessary to support the claims they make in reporting their findings, and for scientific journals to require sharing of software, algorithms and complex datasets.

“This will become standard, to share code and data,” Stodden predicted.

Tags:
Updated on
Backto the news archive

Related News

Tibebu joins the School

The iSchool is pleased to announce that Haileleol Tibebu joined the faculty as a teaching assistant professor on January 1, 2025. His research and teaching interests include responsible AI, AI policy and governance, algorithmic fairness, and the intersection of technology and society.

Haileleol Tibebu

Rhinesmith joins the faculty

The iSchool is pleased to announce that Colin Rhinesmith joined the faculty as a visiting associate professor on January 1, 2025. His position will become permanent following approval by the University of Illinois Board of Trustees. He previously served as founder and director of the Digital Equity Research Center at the Metropolitan New York Library Council.

Colin Rhinesmith

SafeRBot to assist community, police in crime reporting

Across the nation, 911 dispatch centers are facing a worker shortage. Unfortunately, this understaffing, plus the nature of the job itself, leads to dispatchers who are often overworked and stressed. Meanwhile, when community members need to report a crime, their options are to contact 911 for an emergency or, in a non-emergency situation, call a non-emergency number or fill out an online form. A new chatbot, SafeRBot, designed and developed by Associate Professor Yun Huang, Informatics PhD student Yiren Liu, and BSIS student Tony An seeks to improve the reporting process for non-emergency situations for both community members and dispatch centers.

Yun Huang

Hoiem receives Schiller Prize for “Education of Things”

Associate Professor Elizabeth Hoiem has won the 2025 Justin G. Schiller Prize from The Bibliographical Society of America for her book, The Education of Things: Mechanical Literacy in British Children's Literature, 1762-1860 (University of Massachusetts Press). The prize, which recognizes the best bibliographical work on pre-1951 children's literature, includes a cash award of $3,000 and a year's membership in the Society. 

Elizabeth Hoiem

Chan authors new book connecting eugenics and Big Tech

Associate Professor Anita Say Chan has authored a new book that identifies how the eugenics movement foreshadows the predatory data tactics used in today's tech industry. Her book, Predatory Data: Eugenics in Big Tech and Our Fight for an Independent Future, was released this month by the University of California Press and featured in the news outlets San Francisco Chronicle and Mother Jones.

Anita Say Chan