Junwei Deng's Preliminary Exam
PhD candidate Junwei Deng will present his dissertation proposal, "Towards an Actionable Data Lifecycle: A Data Attribution Perspective." His preliminary examination committee includes Assistant Professor Jiaqi Ma (Chair), Professor Jingrui He, Assistant Professor Haohan Wang (Member), and Professor Dilek Hakkani-Tur (Siebel School of Computing and Data Science).
Abstract
Data has a pivotal role in modern Artificial Intelligence (AI) systems, yet systematically understanding how training data influences model behavior remains a fundamental challenge. Data attribution, which seeks to quantify the contribution of individual training data or data subsets to model behavior, offers a principled approach to this challenge, with broad implications for data-centric AI applications, such as data selection to copyright compensation. However, existing data attribution methods face several critical research gaps: computational scalability, limited applicability beyond standard assumptions, and an underdeveloped methodology for transforming attribution scores into actionable decisions across the data lifecycle. This proposal addresses a central research question: how can data attribution be made computationally scalable and broadly applicable to drive actionable decision-making across the AI data lifecycle?
We organize the dissertation proposal around four sub-research questions. First, we develop efficient ensembling strategies and establish a comprehensive library and benchmark suite to improve the scalability and deployability of data attribution methods. Second, we revisit the theoretical foundations of data attribution methods and derive principled formulations that relax standard assumptions. Third, we integrate data attribution into the training process of time series foundation models as a dynamic guiding signal for data augmentation. Fourth, we design an attribution-based royalty framework that connects training data contribution to economic compensation in music generative AI. This study contributes to the growing field of data-centric AI by investigating how data attribution can be a key factor in making decisions throughout the data lifecycle.
Questions? Contact Junwei Deng.