Jill Naiman Presentation
Teaching Assistant Professor Jill Naiman will present "Uncertainty in Scientific and Scholarly Data Curation - Quantification and Communication of Computational Methods."
Abstract: With the proliferation of machine learning methods to curate scholarly data and publications, it is imperative that the affected communities understand how these newer methods can impact scientific results. In this talk I will discuss my computational work to develop methods and metrics for the digitization and curation of historical and newer scientific datasets in partnership with domain scientists. I start with an overview of my work on the digitization of the historical holdings of the NASA Astrophysics Data System, which serves as a repository of astronomy, physics, earth science, and heliophysics articles. I then discuss ongoing work to segment the biological images housed in the WormAtlas project – a central repository for electron microscopy images of model organisms widely used in the field of neuroscience. In both projects I highlight how the computational methods applied (object detection, segmentation, image processing, optical character recognition, and natural language processing) can produce imperfectly curated datasets or erroneous scientific results even when achieving high accuracies as measured through typical computational metrics.
All slides and alt-text for images will be made available the day before the talk through this link.
Bio: Jill Naiman is a teaching assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign and a faculty affiliate at the National Center for Supercomputing Applications. After receiving her PhD in astronomy and astrophysics from University of California, Santa Cruz, she was a National Science Foundation Postdoctoral Fellow, followed by an Institute of Theory and Computation Fellow, at the Harvard-Smithsonian Center for Astrophysics where her work focused on computational hydrodynamics and data visualization.
Her current work focuses on automated methods for the digitization and curation of historical scientific documents and the development of metrics for assessing the accuracy of machine learning digitization methods on downstream scientific tasks. This work is funded through grants from NASA's Astrophysics Data Analysis Program, National Institutes of Health, The Brinson Foundation, and the Fiddler Foundation. Additionally, she teaches courses in data visualization, data storytelling, statistics, and has mentored over 40 graduate and undergraduate students on research projects.