Abstract: One of the main problems in data minimization is the determination of the relevant data set. Combining the Chase—a universal tool for transforming databases—and data provenance, a (anonymized) minimal sub-database of an original data set can be calculated. To ensure reproducibility, the evaluations performed on the original data set must be feasible on the sub-database, too. For this, we extend the Chase&Backchase with additional why-provenance to handle lost attribute values, null tuples, and duplicates occurring during the query evaluation and its inversion. I am pleased to present you the ProSA pipeline, which describes a method of data minimization using the Chase&Backchase extended with additional provenance.
Speaker bio: Tanja Auge studied Math (B.Sc. 2014; M.Sc. 2016) and Computer Science (M.Sc. 2017) at the Universities of Hamburg and Rostock, Germany. She is currently completing her Ph.D. at the University of Rostock on the topic of "Provenance Management using Schema Mappings with Annotations." In October she will move to the University of Regensburg, Germany.
This event is sponsored by iSchool's Conceptual Foundations Group (CFG).