This project examines the impact of different research funding structures on the training of future scientists, particularly graduate students and postdoctoral fellows, and the impact on their subsequent outcomes. Our proposed research begins by examining the way in which research (and most training) is funded and done. We classify projects by whether they are large or small scale (by funding size); multiple researchers; or multiple institutions. We construct different measures of project teams, and capture the subsequent trajectories of the students and postdoctoral fellows during and after their contact with the teams. We make use of a natural experiment and quasi experimental statistical techniques to separate the effect of funding structures from the other factors contributing to...

The goal of this research is to help researchers develop and use relatively simple tools to describe species in a way that make those descriptions easier to share with other scientists and easier for computers to process and analyze. The approach is bottom-up and iterative, involving the rapid prototyping of tools, combining of existing tools, and the tailoring of applications developed for one purpose but now being reused for this scientific activity. Innovation from this project is applicable to the long-term development of open source software initiatives serving labs throughout the world. The project provides rich, real-world training for graduate students in library and information sciences, training them to be much needed cross-disciplinary researchers in a field desperate for...

Data Observation Network for Earth (DataONE) is a collaborative, global project that is laying the groundwork for a new, innovative approach to conducting environmental science research. DataONE is a distributed framework and sustainable infrastructue poised to resolve many of the key challenges that hinder the realization of more global, open, and reproducible science, through four interrelated cyberinfrastructure (CI) activities:

  • significantly expanding the volume and diversity of data available to researchers for large-scale scientific innovation and discovery;
  • incorporating innovative and high-value science-enabling features into the DataONE CI;
  • maintaining and improving core software and...
Taxonomists are scientists who describe the world’s biodiversity. These descriptions of millions of species allow scientists to do many different kinds of research, including basic biology, environmental science, climate research, agriculture, and medicine. The problem is that describing any one species is not easy. The language used by taxonomists to describe their data is complex, and typically not easily understandable by computers nor even other scientists. This situation makes it harder to search for patterns across millions of species documented by thousands of researchers over many decades of work worldwide.

The goal of this research is to help researchers develop and use relatively simple tools to describe species in a way that makes those descriptions easier to share...

How can we be rule compliant and still innovate? The collection and analysis of human-centered and/ or data are governed by multiple sets of norms and regulations. Problems can arise when researchers are unaware of applicable rules, uninformed about their practical meaning and compatibility, and insufficiently skilled in implementing them. We are developing and delivering educational modules to address this issue.


Jul. 9, 2018

Assistant Professor Matthew Turk will present the Whole Tale research project at the 17th annual Scientific Computing with Python conference (SciPy 2018), which will be held July 9-15 in Austin, Texas. The conference brings together participants from industry, academia, and government for tutorials, talks, and developer sprints.

Turk will give the talk, "Sneaking Data into Containers with the Whole Tale," with Kacper Kowalik, a research scientist at the National Center for Supercomputing Applications (NCSA). The goal of the Whole Tale research project is to enable researchers to examine, transform, and then seamlessly republish research data, creating "living articles" that will lead to new discovery by allowing researchers to construct representations and syntheses of data. 

"In this talk, we'll describe how the project leverages existing...

Jul. 6, 2018

Professor and Center for Informatics Research in Science and Scholarship (CIRSS) Director Bertram Ludäscher will be the keynote speaker for the 7th International Provenance and Annotation Workshop (IPAW) during ProvenanceWeek 2018, which will be held July 9-13 at King's College in London. While provenance information has long been recognized as crucial metadata in the information sciences, provenance research has become increasingly important in computer science as well.

During ProvenanceWeek, researchers from computer science and related disciplines will participate in the two main events, i.e., the biennial IPAW and the annual TaPP (Theory and Practice of Provenance) workshop, and in affiliated events that focus on novel directions for provenance.

In his opening keynote at IPAW, "From Workflows to Provenance and Reproducibility: Looking...

Jul. 3, 2018

Assistant Professor Jodi Schneider will discuss her medical informatics research at the 9th Conference of the International Society for the Study of Argumentation, which will be held July 3-6 at the University of Amsterdam. The conference brings together scholars from a variety of disciplines who are working in the field of argumentation theory. 

Schneider will give two talks during a session focusing on argumentation in health. She will present "Innovations in Reasoning About Health: The Case of the Randomized Controlled Trial," which was coauthored by Sally Jackson, a professor of communication at Illinois. The researchers' recent work introduces the concept of a "warranting device" to analyze innovations in drawing conclusions.

"One example of a warranting device is the Randomized Controlled Trial, or RCT," said Schneider. "Now it's considered the 'gold standard' for reasoning about...

Jun. 14, 2018

Associate Professor Victoria Stodden will be a keynote speaker at the second annual Building Research Integrity Through Reproducibility conference, which will be held on June 15 at the University of Utah. She will also moderate the panel, "What Universities Do (and Don't Do) to Influence (or not) Research Reproducibility."

In her keynote presentation, "Computational Reproducibility," she will frame reproducibility in data-enabled scientific discovery, provide a brief history of efforts towards reproducibility within the scientific community, discuss the problems in replicating computational findings, and examine the lifecycle of data science. According to Stodden, the future of data science will include a major effort to develop infrastructure that supports the entire data science lifecycle, "promoting good scientific practice downstream like transparency and reproducibility."

Stodden's research...

Jun. 4, 2018

Associate Professor Victoria Stodden will be a keynote speaker at the 2018 IEEE Data Science Workshop, which will be held June 4-6 in Lausanne, Switzerland. The workshop will bring together researchers from the academic disciplines of data science, including signal processing, statistics, machine learning, data mining, and computer science, along with industry experts from fields such as personalized health and medicine, earth and environmental science, applied physics, finance and economics, and intelligent manufacturing. 

Stodden will give the keynote, "Reproducibility and Generalizability in Data-enabled Discovery."

Abstract: As computation becomes central to scientific research and discovery – bringing us the field of Data Science – new questions arise regarding the implementation, dissemination, and evaluation of methods that underlie scientific claims. I present a framework for conceptualizing the affordances that support Data Science including computational...

Apr. 5, 2018

Associate Professor Victoria Stodden will present her research on reproducibility at the University of Delaware Department of Computer & Information Sciences Distinguished Speaker Lecture on April 6. The theme for the lecture series is "rising stars in a scientific world of convergence."

According to Stodden, the rate of production, collection, and analysis of data, and the speed at which computational infrastructure is changing (e.g., technologies for cloud computing, network capabilities, and high performance computing systems) implies a need for extreme agility in computationally enabled research. 

"In my talk, 'The Science of Computational Reproducibility,' I will outline a research agenda for the science of reproducibility that responds to the opportunities created by this rapid evolution in research environments, addressing reliability and...

Oct. 3, 2017

Professor and Center for Informatics Research in Science and Scholarship (CIRSS) Director Bertram Ludäscher and collaborators are presenting their joint work and tools for data quality, cleaning, and provenance at the 33rd Annual Biodiversity Information Standards conference, TDWG 2017, from October 1-6 in Ottawa, Canada. The annual conference provides a forum for developing standards and demonstrating new technologies and tools for biodiversity informatics. This year's theme is "Data Integration in a Big Data Universe: Associating Occurrences with Genes, Phenotypes, and Environments."

Three of the abstracts presented at TDWG 2017 are outcomes of the Kurator project, a collaboration between Illinois and the Museum of Comparative Zoology (MCZ) at Harvard University. Kurator is a suite of biodiversity...