Parulian defends dissertation

Doctoral candidate Nikolaus Parulian successfully defended his dissertation, "A Conceptual Model for Transparent, Reusable, and Collaborative Data Cleaning," on June 29.

His committee included Professor Bertram Ludäscher (chair), Professor J. Stephen Downie, Associate Professor Jana Diesner, and Assistant Professor Nigel Bosch.

Abstract: Data cleaning is an essential component of data preparation in machine learning and other data science workflows. It is a time-consuming and error-prone task that can greatly affect the reliability of subsequent analyses. Tools must capture provenance information to ensure transparent and auditable data-cleaning processes. However, existing provenance models have limitations in tracing and querying changes at different levels of granularity. To address this, we proposed a new conceptual model that captures fine-grained retrospective provenance and extends it with prospective provenance to represent operations or workflows that change the datasets. This hybrid model allows powerful queries and supports advanced use cases like auditing data cleaning workflows. Additionally, we extended the model to present a conceptual model focusing on reusability and collaboration in data cleaning. It addresses scenarios where multiple users contribute to dataset changes and enables tracking of curator actions, identifying dependencies between cleaning operations, and facilitating collaboration. Through an experimental case study, we demonstrated the reusability of data-cleaning workflows, different users' contributions, and collaboration's effectiveness in improving data quality.

Updated on
Backto the news archive

Related News

Youth-AI-Safety named a winning team in international hackathon

A team of researchers from the SALT (Social Computing Systems) Lab has been selected as a winner in an international hackathon hosted by the Berkeley Center for Responsible, Decentralized Intelligence. The LLM Agents MOOC Hackathon brought together over 3,000 students, researchers, and practitioners from 127 countries to build and showcase innovative work in large language model (LLM) agents, grow the AI agent community, and advance LLM agent technology.

New home for the Center for Children’s Books

The Center for Children's Books (CCB) at the iSchool is a crossroads for critical inquiry, professional training, and educational outreach related to youth-focused resources, literature, and librarianship. The CCB houses a non-circulating research collection of children’s and young adult books, with emphasis placed on books published within the last two years. The CCB recently moved to a new home in the iSchool building at 501 East Daniel Street. 

inside the Center for Children's Books with colorful furniture and carpet and bookcases.

McDowell to present keynote on data storytelling to state library leaders

Associate Professor Kate McDowell will present the keynote at the Chief Officers of State Library Agencies (COSLA) Spring Meeting on March 4 in Washington, D.C. COSLA is an independent organization whose membership consists of the top library officers of the states and territories, variously designated as state librarian, director, commissioner, or executive secretary.

Kate McDowell

Gore honored in Singapore for community service

BSIS student Saloni Gore is passionate about community service, especially projects related to sustainability and social impact. It is this commitment to making a difference that prompted her to start a project to help provide clean water to rural communities in India and led her from Singapore to the iSchool, where she can learn how to use data and technology to benefit the world.

Saloni Gore