Team Illinois Developing Advanced Provenance Tools for DataONE

Bertram Ludäscher
Bertram Ludäscher, Professor and Director, Center for Informatics Research in Science and Scholarship

Provenance information describes the origin and history of artifacts. Because of the vital role played by data and workflow provenance in support of transparency and reproducibility in computational and data science, creating tools for capturing and using provenance information is an important yet challenging task.

Post-doctoral Research Associate Yang Cao and Professor Bertram Ludäscher recently presented joint work on data provenance at the Data Observation Network for Earth (DataONE) All Hands Meeting in Santa Ana Pueblo, New Mexico. In their poster and system demonstration, jointly authored by a team of University of Illinois students and staff as well as collaborators from the UK, Cao and Ludäscher demonstrated how the YesWorkflow tool is "Revealing the Detailed History of Script Outputs with Hybrid Provenance Queries."1

In an earlier article for the Winter 2015/6 issue of DataONE News, "Your Data has a History, too: Towards Transparency and Reproducibility through Provenance,"2 Ludäscher discussed data provenance—how critical it is for transparency, data quality, and computational reproducibility, yet how difficult it is to make use of provenance information unless better tools are available. "Gathering provenance and then linking data, provenance, and software with each other and to publications is a complex and often labor-intensive, manual process. However, as more and more tools become 'provenance-aware' and allow scientists to record and share provenance information, there is hope that provenance management will become much easier and more seamless in the future," he said.

One such tool, YesWorkflow,3,4 is based on a simple annotation language for data analysis scripts. According to Ludäscher, "This language-independent, lightweight annotation approach not only yields an informative workflow model of a script, thus facilitating understanding and reuse of the script, but it can also be used to reconstruct runtime provenance information from script executions and link this information back to the scientist’s conceptual workflow. In this way, provenance can be the subject and driver of powerful queries against the scientist's own data, making provenance not only useful metadata for others, but letting scientists themselves immediately benefit from the provenance information they created."

DataONE is supported by the National Science Foundation and was developed to ensure the preservation, access, and reuse of science data via a federation of member nodes and coordinating nodes, an investigator toolkit, and a broad education and outreach program.

Ludäscher, director of the iSchool's Center for Informatics Research in Science and Scholarship (CIRSS), is a leading figure in data and knowledge management, focusing on the modeling, design, and optimization of scientific workflows, provenance, data integration, and knowledge representation. He joined the iSchool faculty in 2014 and is a faculty affiliate at NCSA and the Department of Computer Science. 

References

1Yang Cao,  Duc Vu, Qiwen Wang, Qian Zhang, Priyaa Thavasimani, Timothy McPhillips, Paolo Missier, Bertram Ludäscher (2016). Revealing the Detailed History of Script Outputs with Hybrid Provenance. Poster and System Demonstration, DataONE All Hands Meeting, September 20-22, Santa Ana Pueblo, New Mexico.
2Bertram Ludäscher (2016). “Your Data has a History, too: Towards Transparency and Reproducibility through Provenance,” DataONE News 4(2), Winter 2015/6.
3YesWorkflow toolkit.
4T. McPhillips, S. Bowers, K. Belhajjame, B. Ludäscher (2015). Retrospective Provenance Without a Runtime Provenance Recorder. 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP'15).

Tags:
Updated on
Backto the news archive

Related News

iSchool represented at Charleston Conference

iSchool adjunct and affiliate faculty will participate in virtual and in-person sessions of the 2024 Charleston Conference. The conference is an annual gathering that draws librarians, publishers, vendors, and others to discuss issues relating to the acquisition and publication of books and serials. 

Schneider group to present at ASIS&T workshop

Members of Associate Professor Jodi Schneider’s group will present their research at the Association for Information Science and Technology (ASIS&T) Workshop on Informetric, Scientometric, and Scientific and Technical Information Research, which will be held virtually on November 6 and 13. The MET-STI 2024 Workshop is collaboratively hosted by the Special Interest Group for Metrics (SIG-MET) and Special Interest Group for Scientific and Technical Information (SIG-STI) of ASIS&T.

Jodi Schneider

Wong co-edits new edition of Reference and Information Services

Adjunct Lecturer Melissa Wong (MSLIS '94) and Laura Saunders, professor of library and information science at Simmons University, are the co-editors of Reference and Information Services: An Introduction, Seventh Edition, which was recently published by Bloomsbury Libraries Unlimited. The textbook provides a comprehensive update to the previous edition, also co-edited by Wong and Saunders, and serves as an essential resource for LIS students and practitioners alike.

Melissa Wong

iSchool researchers to present at ASSETS 2024

iSchool faculty and students will present their research at the 26th International Association for Computing Machinery (ACM) Special Interest Group (SIG) ACCESS Conference on Computers and Accessibility (ASSETS 2024), which will be held on October 28-30 in St. John's, Newfoundland and Labrador, Canada. The conference is the premier forum for presenting research on design, evaluation, use, and education related to computing for people with disabilities and older adults.

iSchool well represented at ASIS&T 2024

iSchool faculty, staff, and students will participate in the 87th Annual Meeting of the Association for Information Science and Technology (ASIS&T), which will be held on October 25-29 in Calgary, Canada. The theme of this year's conference is "Putting People First: Responsibility, Reciprocity, and Care in Information Research and Practice." The meeting is the premier international conference dedicated to the study of information, people, and technology in contemporary society.

iSchool Building