Parulian defends dissertation

Doctoral candidate Nikolaus Parulian successfully defended his dissertation, "A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning," on June 29.

His committee included Professor Bertram Ludäscher (chair), Professor J. Stephen Downie, Associate Professor Jana Diesner, and Assistant Professor Nigel Bosch.

Abstract: Data cleaning is an essential component of data preparation in machine learning and other data science workflows. It is a time-consuming and error-prone task that can greatly affect the reliability of subsequent analyses. Tools must capture provenance information to ensure transparent and auditable data-cleaning processes. However, existing provenance models have limitations in tracing and querying changes at different levels of granularity. To address this, we proposed a new conceptual model that captures fine-grained retrospective provenance and extends it with prospective provenance to represent operations or workflows that change the datasets. This hybrid model allows powerful queries and supports advanced use cases like auditing data cleaning workflows. Additionally, we extended the model to present a conceptual model focusing on reusability and collaboration in data cleaning. It addresses scenarios where multiple users contribute to dataset changes and enables tracking of curator actions, identifying dependencies between cleaning operations, and facilitating collaboration. Through an experimental case study, we demonstrated the reusability of data-cleaning workflows, different users' contributions, and collaboration's effectiveness in improving data quality.

Updated on
Backto the news archive

Related News

Spectrum Scholar Spotlight: Mateo Caballero

Twelve iSchool master’s students were named 2024-2025 Spectrum Scholars by the American Library Association (ALA) Office for Diversity, Literacy, and Outreach Services. This "Spectrum Scholar Spotlight" series highlights the School's scholars. MSLIS student Mateo Caballero graduated from Northeastern University with a BA in communications and media and screen studies.

Mateo Caballero

iSchool represented at Charleston Conference

iSchool adjunct and affiliate faculty will participate in virtual and in-person sessions of the 2024 Charleston Conference. The conference is an annual gathering that draws librarians, publishers, vendors, and others to discuss issues relating to the acquisition and publication of books and serials. 

Schneider group to present at ASIS&T workshop

Members of Associate Professor Jodi Schneider’s group will present their research at the Association for Information Science and Technology (ASIS&T) Workshop on Informetric, Scientometric, and Scientific and Technical Information Research, which will be held virtually on November 6 and 13. The MET-STI 2024 Workshop is collaboratively hosted by the Special Interest Group for Metrics (SIG-MET) and Special Interest Group for Scientific and Technical Information (SIG-STI) of ASIS&T.

Jodi Schneider

iSchool International: Studying abroad in Melbourne

BSIS + DS student Jenny Mai discusses her study abroad experience in Melbourne, Australia, a country filled with energy, culture, and a laid-back but driven attitude. According to Mai, "living in Melbourne has been more transformative" than she expected!

Jenny Mai

Allgood is 'all in' on information science

MSLIS student Evan Allgood's volunteer work showed him that a career in information science would bring all his interests together in one field: accessibility, literature, history, technology, databases, and community building.

Evan Allgood