Parulian defends dissertation

Doctoral candidate Nikolaus Parulian successfully defended his dissertation, "A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning," on June 29.

His committee included Professor Bertram Ludäscher (chair), Professor J. Stephen Downie, Associate Professor Jana Diesner, and Assistant Professor Nigel Bosch.

Abstract: Data cleaning is an essential component of data preparation in machine learning and other data science workflows. It is a time-consuming and error-prone task that can greatly affect the reliability of subsequent analyses. Tools must capture provenance information to ensure transparent and auditable data-cleaning processes. However, existing provenance models have limitations in tracing and querying changes at different levels of granularity. To address this, we proposed a new conceptual model that captures fine-grained retrospective provenance and extends it with prospective provenance to represent operations or workflows that change the datasets. This hybrid model allows powerful queries and supports advanced use cases like auditing data cleaning workflows. Additionally, we extended the model to present a conceptual model focusing on reusability and collaboration in data cleaning. It addresses scenarios where multiple users contribute to dataset changes and enables tracking of curator actions, identifying dependencies between cleaning operations, and facilitating collaboration. Through an experimental case study, we demonstrated the reusability of data-cleaning workflows, different users' contributions, and collaboration's effectiveness in improving data quality.

Updated on
Backto the news archive

Related News

Layne-Worthey edits book on digital humanities and LIS

Glen Layne-Worthey, associate director for research support services for the HathiTrust Research Center (HTRC), and Isabel Galina, researcher at the Institute for Bibliographic Studies at the National University of Mexico, have edited a new book, The Routledge Companion to Libraries, Archives, and the Digital Humanities, which was recently released by Routledge.

Glen Layne-Worthey

Wang group to present at BigData 2024

Members of Associate Professor Dong Wang's research group, the Social Sensing and Intelligence Lab, will present their research at the 2024 IEEE International Conference on Big Data (BigData 2024), which will be held from December 15-18 in Washington, D.C. BigData 2024 is the premier venue to present and discuss progress in research, development, standards, and applications of topics in artificial intelligence, machine learning and big data analytics.

Dong Wang

Book co-edited by Sayuno wins national award in Philippines

A book edited by Postdoctoral Research Associate Cheeno Marlo Sayuno and Eugene Evasco has received a National Book Award from the Republic of the Philippines. The award, sponsored by the National Book Development Board and the Manila Critics Circle, is an annual prize that honors the most outstanding titles written, designed, and published in the Philippines. 

Cheeno Sayuno

Walters learns history of ATO through archives assistantship

When MSLIS student Deborah Walters was offered a graduate assistantship to work in the Alpha Tau Omega Archives, she viewed it as a "unique opportunity to have a hands-on independent experience in archives" that she couldn't pass up. Alpha Tau Omega (ATO) is a social fraternity that was founded at the Virginia Military Institute in 1865. Its archives are among the national fraternity collections housed at the Student Life and Culture Archives at the University of Illinois.

Deborah Walters

Antwi grateful for Balz Scholarship

MSLIS student Victora Antwi is grateful for the financial support that she has received through the Balz Endowment Fund. An international student from the Mampong-Nsuta in the Ashanti Region, Ghana, Antwi earned her bachelor’s degree in information studies in 2020 from the University of Ghana. 

Victoria Antwi