Stodden proposes guide for developing common data science approaches

The use of data science tools in research across campuses has exploded–from engineering and science to the humanities and social sciences. But there is no established data science discipline and no recognized way for various academic fields to develop and integrate accepted data science processes into research.

Associate Professor Victoria Stodden has proposed a framework for guiding researchers and curriculum development in data science and for aiding policy and funding decisions. She outlines the approach in the journal Communications of the ACM.

Stodden has studied issues of reproducibility of research findings for more than a decade. Now, the widespread use of computational tools for research has initiated discussions about transparency, bias, ethics and other topics. These ideas are broader than any particular field, and researchers from different fields need a common framework for how to approach and talk about them, she said.

Stodden said her approach will help define data science as a scientific discipline in its own right; provide a way to have a common conversation across various disciplines; encourage development of and train researchers and scientists on data-driven research methods; help them to agree on the most important issues in the emerging field of data science; and help consumers of computational research to understand how the results were produced.

"I'm hoping it's a way to unify the conversations going on now–to help them evolve and share knowledge in a way to leverage and learn from what other people are doing–and talk about what's going on across different disciplines," Stodden said.

The framework helps identify which issues can be generalized across disciplines and which are specific to a discipline, she said.

Stodden's proposal builds on the concept of the data life cycle used by information scientists to describe the various stages of a dataset. Her data science life cycle looks at not only datasets, but also the tools of computational research such as computer code and software, as well as the research findings.

The data science life cycle would allow researchers to look at the computational research process from data collection to analysis, validation, dissemination and ultimately how the research findings are used in public policy discussions, she said. It would bring into the conversation concepts of transparency, reproducibility of results, how results are interpreted, potential bias and ethics.

reproducibility data flow
An example of the data science life cycle, which describes the stages of data science research. Courtesy of Victoria Stodden


"It's a framework for how to bring all these different topics together and think about what it means to have a field of data science," Stodden said. "With more strategic thinking about what data science means, and what it means to leverage these tools, we will be doing better science."

The data science life cycle recognizes the need for preserving data, software and computational information and making them widely available after results are published, allowing for reproducibility.

Her approach also will help guide the development of a curriculum of data science, she said, providing a way to see where existing courses fit and where new courses may need to be developed.

"For a student seeking to do advanced coursework in data science, it can appear that statistics is not computational enough, computer science isn’t data inference-focused enough, information science is too broad, and the domain sciences don’t provide a broad enough pedagogical agenda in data science," she wrote.

Updated on
Backto the news archive

Related News

iSchool students present their research at Urbana City Council meeting

At the Urbana City Council meeting on May 9, students in the Community Data (IS 594) course presented their research on how communities are reducing gun violence. According to their instructor Chamee Yang, postdoctoral research associate with the iSchool, Community Data Clinic, and Just Infrastructures Initiative, the new course was designed as an experiential learning opportunity with a community engagement component, where students could gain research experience with real-world implications. Throughout the Spring 2022 semester, students worked in groups to explore community-driven approaches to prevent gun violence.

Chamee Yang, Sarah Unruh, and Gowri Balasubramaniam

Dinh defends dissertation

Doctoral candidate Ly Dinh successfully defended her dissertation, "Advances to Network Analysis Theories and Methods for the Understanding of Formal and Emergent Structures in Interpersonal, Corporate/Organizational, and Hazards Response Setting," on May 19.

Ly Dinh

Lee selected for leadership institute

MS/LIS student Kyra Lee had the opportunity to network with leaders in the LIS field at the 2022 Black Caucus American Library Association (BCALA) Leadership Institute. At the inaugural event, which took place from April 12-14 in Durham, North Carolina, LIS students and early career library professionals gathered for workshops, panels, facilitated discussions, and presentations. Lee was one of eighteen students selected to participate in the institute.

Kyra Lee

2021 Downs Intellectual Freedom Awards given to #FReadom Fighters and ALA Office for Intellectual Freedom staff

For libraries and librarians, 2021 was an especially challenging year in terms of the increase in attempts at censorship. According to the American Library Association (ALA) Office for Intellectual Freedom, the number of challenges to library materials more than tripled from 2020 to 2021. In addition, current estimates show that 82 to 97 percent of challenges go unreported, suggesting that the total number of challenges are significantly greater.

#FReadom Fighter logo

New project to improve health of patients with kidney failure

There are approximately 600,000 individuals in the U.S. who are undergoing hemodialysis (HD) therapy for kidney failure. In hemodialysis, a machine filters wastes, salts, and fluid from the blood when an individual's kidneys are no longer healthy enough to do this work adequately. While lifestyle changes such as getting more exercise and making better nutritional choices would benefit HD patients, they are not popular with patients—leading to poor health outcomes. A new project, led by Assistant Professor Jessie Chin, aims to boost HD patients' commitment to exercise through a long-term motivational interviewing conversational agent (LotMintBot).

Jessie Chin