Is code enough? Stodden, Marinov research focuses on providing code rather than generated data with research

Sharing and reusing research data is becoming increasingly common in the scientific world, allowing researchers to more easily build on the work of others as they seek new discoveries.

A new research project being conducted by Associate Professor Victoria Stodden and Illinois Computer Science Professor Darko Marinov aims to answer key questions about how researchers can reliably share the code used to generate their data rather than the more costly data itself.

"Our question was, when is it possible to save only the code that produced simulated data, and that’s all I need to save, and when do I also need to save the data?" Stodden said. "Simulation codes can produce massive amounts of data, for example petabytes of data. If I can rerun the code and regenerate the data, in theory I don't even need to save the data. For what types of codes is that possible? That's exactly the question we're trying to answer."

The National Science Foundation is funding the work by Stodden and Marinov, providing $300,000 over two years. Stodden, who also is an Illinois Computer Science faculty affiliate, is the PI on the project. Marinov is the co-PI.

As Stodden, who is the lead investigator for the project, explains, the format for scholarly articles has changed little in decades. It provides only a small space to discuss how researchers derived their results.

But as computation has become more integral to research across virtually every scientific field, that format has become inadequate, she said.

"There's such an amount of complexity – the computer can do X calculations per second. So how do you actually explain the increased complexity of computational research in words in a small section in a paper? It can be very, very difficult," Stodden said.

Now some journals, she said, have begun to require researchers to publish their data and code along with their findings.

Stodden and Marinov, an expert on the testing and reliability of software, wondered whether providing the code alone could reliably allow the results of a given paper to be reproduced. And if it is the code that accompanies the published research, what kind of standard should it meet?

"If code is going to travel with this scholarly output, the community will need to come to some type agreement regarding code standards," she said.

For their project, the two are focusing on physics research as an example because of its intensive computational needs.

In preliminary work using articles from the Journal of Computational Physics, Stodden and her group tried to replicate the computational results from 55 articles and were unable to reproduce any. After contacting the authors, Stodden says they came away with the impression that many believed reproducing their computational results would be straightforward, something they found not to be the case.

Eventually, Stodden and Marinov hope to determine whether and how code could be reliably substituted for data for a wide range of fields.

"We want to learn how to do better scientific software, software that is more reliable, and that researchers can trust more," Stodden said. "These questions have come about not because the scientific community isn't doing a good job; they came about because computation is so important, and increasingly so. We're chasing fascinating opportunities here."

Research Areas:
Tags:
Updated on
Backto the news archive

Related News

Rhinesmith joins the faculty

The iSchool is pleased to announce that Colin Rhinesmith joined the faculty as a visiting associate professor on January 1, 2025. His position will become permanent following approval by the University of Illinois Board of Trustees. He previously served as founder and director of the Digital Equity Research Center at the Metropolitan New York Library Council.

Colin Rhinesmith

SafeRBot to assist community, police in crime reporting

Across the nation, 911 dispatch centers are facing a worker shortage. Unfortunately, this understaffing, plus the nature of the job itself, leads to dispatchers who are often overworked and stressed. Meanwhile, when community members need to report a crime, their options are to contact 911 for an emergency or, in a non-emergency situation, call a non-emergency number or fill out an online form. A new chatbot, SafeRBot, designed and developed by Associate Professor Yun Huang, Informatics PhD student Yiren Liu, and BSIS student Tony An seeks to improve the reporting process for non-emergency situations for both community members and dispatch centers.

Yun Huang

Hoiem receives Schiller Prize for “Education of Things”

Associate Professor Elizabeth Hoiem has won the 2025 Justin G. Schiller Prize from The Bibliographical Society of America for her book, The Education of Things: Mechanical Literacy in British Children's Literature, 1762-1860 (University of Massachusetts Press). The prize, which recognizes the best bibliographical work on pre-1951 children's literature, includes a cash award of $3,000 and a year's membership in the Society. 

Elizabeth Hoiem

Chan authors new book connecting eugenics and Big Tech

Associate Professor Anita Say Chan has authored a new book that identifies how the eugenics movement foreshadows the predatory data tactics used in today's tech industry. Her book, Predatory Data: Eugenics in Big Tech and Our Fight for an Independent Future, was released this month by the University of California Press and featured in the news outlets San Francisco Chronicle and Mother Jones.

Anita Say Chan

Wang group to present at BigData 2024

Members of Associate Professor Dong Wang's research group, the Social Sensing and Intelligence Lab, will present their research at the 2024 IEEE International Conference on Big Data (BigData 2024), which will be held from December 15-18 in Washington, D.C. BigData 2024 is the premier venue to present and discuss progress in research, development, standards, and applications of topics in artificial intelligence, machine learning and big data analytics.

Dong Wang