Parulian defends dissertation

Doctoral candidate Nikolaus Parulian successfully defended his dissertation, "A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning," on June 29.

His committee included Professor Bertram Ludäscher (chair), Professor J. Stephen Downie, Associate Professor Jana Diesner, and Assistant Professor Nigel Bosch.

Abstract: Data cleaning is an essential component of data preparation in machine learning and other data science workflows. It is a time-consuming and error-prone task that can greatly affect the reliability of subsequent analyses. Tools must capture provenance information to ensure transparent and auditable data-cleaning processes. However, existing provenance models have limitations in tracing and querying changes at different levels of granularity. To address this, we proposed a new conceptual model that captures fine-grained retrospective provenance and extends it with prospective provenance to represent operations or workflows that change the datasets. This hybrid model allows powerful queries and supports advanced use cases like auditing data cleaning workflows. Additionally, we extended the model to present a conceptual model focusing on reusability and collaboration in data cleaning. It addresses scenarios where multiple users contribute to dataset changes and enables tracking of curator actions, identifying dependencies between cleaning operations, and facilitating collaboration. Through an experimental case study, we demonstrated the reusability of data-cleaning workflows, different users' contributions, and collaboration's effectiveness in improving data quality.

Updated on
Backto the news archive

Related News

Spectrum Scholar Spotlight: Dalia Ortiz Pon

Twelve iSchool master's students were named 2024–2025 Spectrum Scholars by the American Library Association (ALA) Office for Diversity, Literacy, and Outreach Services. This "Spectrum Scholar Spotlight" series highlights the School's scholars. MSLIS student Dalia Ortiz Pon earned her bachelor's degree in Latina/Latino studies from San Francisco State University. 

Dalia Ortiz Pon

Debnath datafies "The Bulletin"

MSIM student Tan Debnath, whose interests span data mining, statistical modeling, text mining, and digital humanities, joined the Center for Children's books as a research assistant. He was tasked with building curation processes that would datafy seventy-five years' worth of archival issues of The Bulletin of the Center for Children's Books, one of the nation's leading children's book review journals.

Tan Debnath stands casually with his hands in his pockets and smiles broadly at the camera. It's a sunny day

He receives Amazon Research Award to improve monitoring of Earth’s ecosystem

A new project led by Professor Jingrui He aims to help scientists monitor disruptions to the Earth’s ecosystem, such as climate change. She recently received support for her work through an Amazon Research Award, which includes $60,000 in cash and an additional $40,000 in Amazon Web Services (AWS) credits.

Jingrui He

iSchool undergraduates selected as 2025 Community-Academic Scholars

The Interdisciplinary Health Sciences Institute (IHSI) has selected BSIS student Dhanvi Puttur and BSIS+DS student Lara Terpetschnig as 2025 Community-Academic Scholars. Representing nineteen majors and nine minors in eight colleges and schools at the University of Illinois Urbana-Champaign and two additional universities, the eighteen scholars in this cohort encompass diverse fields of study, from community health to graphic design to statistics. 

BSIS+DS student Lara Terpetschnig and BSIS student Dhanvi Puttur

Guan successfully defends dissertation

Doctoral candidate Yingjun Guan successfully defended his dissertation, "Disambiguating Academic Institution Names: A Comprehensive Study of Authority Files, Linguistic Variations, and Computational Evaluation in PubMed Affiliations," on April 28. 

Yingjun Guan