School of Information Sciences

Team Illinois Developing Advanced Provenance Tools for DataONE

Bertram Ludäscher
Bertram Ludäscher, Professor and Director, Center for Informatics Research in Science and Scholarship

Provenance information describes the origin and history of artifacts. Because of the vital role played by data and workflow provenance in support of transparency and reproducibility in computational and data science, creating tools for capturing and using provenance information is an important yet challenging task.

Post-doctoral Research Associate Yang Cao and Professor Bertram Ludäscher recently presented joint work on data provenance at the Data Observation Network for Earth (DataONE) All Hands Meeting in Santa Ana Pueblo, New Mexico. In their poster and system demonstration, jointly authored by a team of University of Illinois students and staff as well as collaborators from the UK, Cao and Ludäscher demonstrated how the YesWorkflow tool is "Revealing the Detailed History of Script Outputs with Hybrid Provenance Queries."1

In an earlier article for the Winter 2015/6 issue of DataONE News, "Your Data has a History, too: Towards Transparency and Reproducibility through Provenance,"2 Ludäscher discussed data provenance—how critical it is for transparency, data quality, and computational reproducibility, yet how difficult it is to make use of provenance information unless better tools are available. "Gathering provenance and then linking data, provenance, and software with each other and to publications is a complex and often labor-intensive, manual process. However, as more and more tools become 'provenance-aware' and allow scientists to record and share provenance information, there is hope that provenance management will become much easier and more seamless in the future," he said.

One such tool, YesWorkflow,3,4 is based on a simple annotation language for data analysis scripts. According to Ludäscher, "This language-independent, lightweight annotation approach not only yields an informative workflow model of a script, thus facilitating understanding and reuse of the script, but it can also be used to reconstruct runtime provenance information from script executions and link this information back to the scientist’s conceptual workflow. In this way, provenance can be the subject and driver of powerful queries against the scientist's own data, making provenance not only useful metadata for others, but letting scientists themselves immediately benefit from the provenance information they created."

DataONE is supported by the National Science Foundation and was developed to ensure the preservation, access, and reuse of science data via a federation of member nodes and coordinating nodes, an investigator toolkit, and a broad education and outreach program.

Ludäscher, director of the iSchool's Center for Informatics Research in Science and Scholarship (CIRSS), is a leading figure in data and knowledge management, focusing on the modeling, design, and optimization of scientific workflows, provenance, data integration, and knowledge representation. He joined the iSchool faculty in 2014 and is a faculty affiliate at NCSA and the Department of Computer Science. 

References

1Yang Cao,  Duc Vu, Qiwen Wang, Qian Zhang, Priyaa Thavasimani, Timothy McPhillips, Paolo Missier, Bertram Ludäscher (2016). Revealing the Detailed History of Script Outputs with Hybrid Provenance. Poster and System Demonstration, DataONE All Hands Meeting, September 20-22, Santa Ana Pueblo, New Mexico.
2Bertram Ludäscher (2016). “Your Data has a History, too: Towards Transparency and Reproducibility through Provenance,” DataONE News 4(2), Winter 2015/6.
3YesWorkflow toolkit.
4T. McPhillips, S. Bowers, K. Belhajjame, B. Ludäscher (2015). Retrospective Provenance Without a Runtime Provenance Recorder. 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP'15).

Tags:
Updated on
Backto the news archive

Related News

New multi-institutional project to use AI to represent past historical periods

A new project led by a team of researchers from four universities aims to create and evaluate language models that represent past historical periods. The project, "Artificial Intelligence for Cultural and Historical Reasoning," was recently selected for a 2025 Humanities and AI Virtual Institute (HAVI) award from Schmidt Sciences. The $800,000 grant will be split among four institutions: Cornell University, the University of Illinois Urbana-Champaign, The University of British Columbia, and McGill University. Professor Ted Underwood will serve as the principal investigator for the portion of the project at Illinois.

Ted Underwood

Wang group to present at WSDM26

Professor and Associate Dean for Research Dong Wang and PhD student Ruohan Zong will present their research at the 19th ACM International Conference on Web Search and Data Mining (WSDM 26), which will be held from February 22–26 in Boise, Idaho. WSDM is a premier international conference in web search, data mining, and AI, known for its highly selective acceptance rates. This year, the acceptance rate for the main track of the conference was only 16 percent. 

Dong Wang

New NSF award supports innovative role-playing game approach to strengthening research security in academia

A new National Science Foundation (NSF) award will support an innovative effort in the School of Information Sciences to strengthen research security by using structured role-playing games (RPG) to model the threats facing academic research environments. The project, titled "REDTEAM: Research Environment Defense Through Expert Attack Modeling," addresses a growing challenge: balancing the open, collaborative nature of academic research with increasing national security risks and sophisticated adversarial threats. 

Wang appointed associate dean for research

The iSchool is pleased to announce that Professor Dong Wang has been appointed associate dean for research. In this role, Wang will provide leadership in the support, integration, communication, and administration of the iSchool's research and scholarship endeavors. This includes supervising the iSchool's Research Services unit, supporting the research centers, and assisting faculty in the acquisition of research funding.

Dong Wang

Knox authors new edition of Book Banning

The second edition of Interim Dean and Professor Emily Knox's book, Book Banning in 21st Century America, was recently released by Bloomsbury. The first edition, published by Rowman & Littlefield (now Bloomsbury) in 2015, was the first monograph in the Beta Phi Mu Scholars' Series. The new edition examines 25 contemporary cases of book challenges in schools and public libraries across the United States and breaks down how and why reading practices can lead to censorship.

"Book Banning in 21st Century America" by Emily Knox

School of Information Sciences

501 E. Daniel St.

MC-493

Champaign, IL

61820-6211

Voice: (217) 333-3280

Email: ischool@illinois.edu

Back to top