Trustworthy Computational Science Speaker Series: Marta Mattoso

Friday, April 26, 2024 11:00 AM - 12:00 PM

Marta Mattoso, professor at COPPE-Federal University, Rio de Janeiro, will present “Traceability for trust: applications and challenges.”

Abstract: Script applications like data science span multiple systems that integrate legacy and newly developed software components to deliver value to models and scientific results. Traceability provides access to such end-to-end activities to trust and reproduce results. Hence, it becomes necessary to adopt techniques for tracking and correlating the relevant artifacts being produced by script activities. Provenance data, as defined in W3C PROV, provides an abstraction that represents and correlates artifacts to be tracked. In addition to representing metadata on those artifacts, traceability requires a derivation path, so that the artifact’s generation can be automatically followed. Provenance data has been added to frameworks that help execute scripts on data science, health, IoT, etc. aiming at providing security, trust, reproducibility, and explainability of script results. However, often the provenance support is limited to metadata of the artifacts without access to its derivation path, which limits trust and reproducibility. Using provenance representation for traceability in data science requires techniques to associate provenance data to a script execution without the cost and overhead of fully-fledged data capture and process reengineering. Despite being around for many years, using and querying provenance data is still a challenge. This talk highlights different uses of provenance for trust like in data science, detecting threats, and authenticity of artifacts. I will discuss current challenges for capturing provenance to trace back the artifacts’ derivation path with examples of using provenance in machine learning scripts.

Marta Mattoso is a professor at COPPE-Federal University of Rio de Janeiro. Her subjects of interest in Data Science include aspects of large-scale data management. Among her interests are the provenance data to support human-in-the-loop during the parallel execution of many computing tasks in high performance environments. She has supervised 90 graduate students. She is a CNPq level 1B research productivity fellow. Her research is applied to real problems, addressing scientific experiments in computational science workflows, including machine learning. She coordinates research projects financed by national and international agencies. She is a member of the specialists team of the WorkflowsRI project in the USA.She is a member of ACM, IEEE and founding member of the Brazilian Computer Society. She serves on international conference program committees and is a member of the editorial board of several international journals.

This series, open to the public, is hosted by the Center for Informatics Research in Science and Scholarship (CIRSS). For the Spring 2024 schedule and access to previous talks, visit the Trustworthy Computational Science website. If you are interested in this speaker series, please subscribe to our speaker series calendar: Google Calendar or Outlook Calendar.

Questions? Contact Janet Eke

This event is sponsored by Center for Informatics Research in Science and Scholarship