Trustworthy Computational Science Speaker Series: Juliana Freire
Juliana Freire, Institute Professor at the Tandon School of Engineering and professor of computer science and engineering and data science at New York University, will present "Dataset Search for Data Discovery, Augmentation, and Explanation."
Abstract: Recent years have seen an explosion in our ability to collect and catalog immense amounts of data about our environment, society, and populace. Moreover, with the push towards transparency and open data, scientists, governments, and organizations are increasingly making structured data available on the Web and in various repositories and data lakes. Combined with advances in analytics and machine learning, the availability of such data should in theory allow us to make progress on many of our most important scientific and societal questions. However, this opportunity is often missed due to a central technical barrier: it is currently nearly impossible for domain experts to weed through the vast amount of available information to discover datasets that are needed for their specific application. While search engines have addressed the discovery problem for Web documents, there are many new challenges involved in supporting the discovery of structured data—from crawling the Web in search of datasets, to the need for dataset-oriented queries and new strategies to rank and display results. I will discuss these challenges and present our recent work in this area. In particular, I will introduce a new class of data-relationship queries that, given a dataset, identifies related datasets; I will describe a collection of methods that efficiently support different kinds of relationships that can be used for data explanation and augmentation; and I will demonstrate Auctus, an open-source dataset search engine that we have developed at the NYU Visualization, Imaging, and Data Analysis (VIDA) Center.
Juliana Freire is an Institute Professor at the Tandon School of Engineering and Professor of Computer Science and Engineering and Data Science at New York University. She served as the elected chair of the ACM SIGMOD and as a council member of the Computing Community Consortium (CCC), and was the NYU lead investigator for the Moore-Sloan Data Science Environment, a grant awarded jointly to UW, NYU, and UC Berkeley. She develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. This spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, and web information discovery, as well as different application areas, including urban analytics, misinformation, predictive modeling, and computational reproducibility. She is an active member of the database and Web research communities, with over 250 technical papers (including 12 award-winning papers), several open-source systems, and 12 U.S. patents. According to Google Scholar, her h-index is 66 and her work has received over 19,000 citations. She is an ACM Fellow, a AAAS Fellow, and the recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She was awarded the ACM SIGMOD Contributions Award in 2020. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She has received MSc and PhD degrees in computer science from the State University of New York at Stony Brook, and a BS degree in computer science from the Federal University of Ceara (Brazil).
This series, open to the public, is hosted by the Center for Informatics Research in Science and Scholarship (CIRSS). For the Spring 2024 schedule and access to previous talks, visit the Trustworthy Computational Science website. If you are interested in this speaker series, please subscribe to our speaker series calendar: Google Calendar or Outlook Calendar.
Questions? Contact Janet Eke
This event is sponsored by Center for Informatics Research in Science and Scholarship