Access to big data is crucial for credibility of computational research findings, says Stodden

 Photo by George Dyson

stodden_victoria090527-cr-b_byGeorgeDyso Think of a scientist at work, and you might picture someone at a lab bench, doing a physical experiment involving beakers or petri dishes and recording his or her findings, which will eventually form the basis for a scientific paper.

That’s the old model of science, says University of Illinois professor of library and information science Victoria Stodden, who was recently interviewed by the University of Illinois News Bureau.

Science is being transformed so that massive computation is central to scientific experiments, with scientists using computer code to analyze huge amounts of data. Computational science might be used to study climate change, to simulate the formation of galaxies, for biomolecular modeling or for mining a vast set of data looking for patterns.

But, Stodden says, this relatively new form of scientific inquiry has not yet developed standards for communicating the details of how the work was done or for validating results. The lack of such standards is causing a credibility crisis, Stodden says. Her research looks at the “reproducibility” of computational science – how findings can be verified and an experiment replicated or used as a basis for further research.

In the traditional form of scientific experimentation, a scientist keeps records and provides information about the conditions in the lab and the materials and variable factors in the experiment. Another scientist can run the same experiment to verify the results, or alter it to answer a related research question of his or her own. Such inquiries are central to scientific principles of rooting out errors in process and mistakes in interpretation.

In order to do those things in computational science, others must have access to the data and computer code used, Stodden said. But there are not standards in place for sharing data and code.

“What if there is a mistake in the code? How do I find out if I can’t get to the code?” Stodden asked. “What does it mean to verify a (computer) simulation?”

She and a number of colleagues are advocating for open access to data and code. The problem is not a simple one, though. There are privacy issues involving human subject data, and proprietary issues where the research is the result of a partnership between a scientist and industry.

Then there are the technical issues of where to put software and data, who gets access to it and whether they would yield the same results as hardware and software systems are upgraded.

In numerous articles they’ve published in the last several years, Stodden and her colleagues have offered suggestions to scientists, journal editors and funding agencies for establishing standards to document the software and datasets used in published research results. Their suggestions for incentives to improve scientific integrity generally appeared online at sciencemag.org in late June.

Stodden was part of a group convened by the National Academies of Sciences last fall to look at how the research community can address instances where published research results (whether obtained through computational or more traditional methods of experimentation) cannot be reproduced. They wrote that the pressure to publish and the lengthening time it takes for postdoctoral fellows to obtain a faculty position and their first independent research grants are counterproductive to maintaining high standards of research integrity. They suggested incentives should be changed so researchers are rewarded for the quality and importance of their work, rather than the number of publications they produce.

Stodden said some scientific journals and funding agencies are already adopting open data and code policies for computational research.

The journals Nature and Science both require authors to make the data underlying their published results available upon request, and Science also requires access to computer codes involved in the creation or analysis of data. In 2011, the National Science Foundation began requiring grant applicants to include a data management plan, describing the availability and archiving of data produced by their research, as part of grant applications. And a 2003 report by the National Academies called for scientists to include data, algorithms and other information necessary to support the claims they make in reporting their findings, and for scientific journals to require sharing of software, algorithms and complex datasets.

“This will become standard, to share code and data,” Stodden predicted.

Tags:
Updated on
Backto the news archive

Related News

iSchool represented at Charleston Conference

iSchool adjunct and affiliate faculty will participate in virtual and in-person sessions of the 2024 Charleston Conference. The conference is an annual gathering that draws librarians, publishers, vendors, and others to discuss issues relating to the acquisition and publication of books and serials. 

Schneider group to present at ASIS&T workshop

Members of Associate Professor Jodi Schneider’s group will present their research at the Association for Information Science and Technology (ASIS&T) Workshop on Informetric, Scientometric, and Scientific and Technical Information Research, which will be held virtually on November 6 and 13. The MET-STI 2024 Workshop is collaboratively hosted by the Special Interest Group for Metrics (SIG-MET) and Special Interest Group for Scientific and Technical Information (SIG-STI) of ASIS&T.

Jodi Schneider

Wong co-edits new edition of Reference and Information Services

Adjunct Lecturer Melissa Wong (MSLIS '94) and Laura Saunders, professor of library and information science at Simmons University, are the co-editors of Reference and Information Services: An Introduction, Seventh Edition, which was recently published by Bloomsbury Libraries Unlimited. The textbook provides a comprehensive update to the previous edition, also co-edited by Wong and Saunders, and serves as an essential resource for LIS students and practitioners alike.

Melissa Wong

iSchool researchers to present at ASSETS 2024

iSchool faculty and students will present their research at the 26th International Association for Computing Machinery (ACM) Special Interest Group (SIG) ACCESS Conference on Computers and Accessibility (ASSETS 2024), which will be held on October 28-30 in St. John's, Newfoundland and Labrador, Canada. The conference is the premier forum for presenting research on design, evaluation, use, and education related to computing for people with disabilities and older adults.

iSchool well represented at ASIS&T 2024

iSchool faculty, staff, and students will participate in the 87th Annual Meeting of the Association for Information Science and Technology (ASIS&T), which will be held on October 25-29 in Calgary, Canada. The theme of this year's conference is "Putting People First: Responsibility, Reciprocity, and Care in Information Research and Practice." The meeting is the premier international conference dedicated to the study of information, people, and technology in contemporary society.

iSchool Building