Haohan Wang presentation

"Toward a Principled Understanding of Trustworthy Machine Learning Methods with an Application on Decoding the Genomic Language of Alzheimer’s Disease at Personalized Level"

Abstract: The development of machine learning techniques has offered us a new opportunity to analyze complex structured data at a large scale for scientific discoveries such as to understand the nature of human complex disorders. However, a plain application of machine learning methods, especially the black-box-nature deep learning techniques developed in recent years, may result in plausible knowledge discovered through the model’s learning of spurious features or confounding factors. Therefore, the development of machine learning tools that can incorporate human structured knowledge to deliver trustworthy discoveries is of great importance.

In this talk, driven by the evidence that a key challenge toward trustworthy machine learning lies in the quality of the data, I will introduce a principled view that leads to robust machine learning methods that can learn the nature from the data while staying least influenced by confounding signals raised by the non-ideal data collection strategies. The principled view naturally leads to two concrete methods: one used to leverage deep learning models to read the MRI of Alzheimer’s disease, aiming to offer the physician-agreed interpretation of the pathology of the disease, and the other one, as an ongoing effort, used to enable linear mixed methods to decode the genomic language of the disease at a personalized level, agnostic to ethnicities.

Bio: Haohan Wang obtained his PhD from LTI, School of Computer Science at Carnegie Mellon University, where he works with Professor Eric P. Xing. His research focuses on trustworthy machine learning and its application to computational biology, with a specific focus on decoding the genomic language of Alzheimer’s diseases, supported by technical activities including statistical analysis and deep learning methods development, with a particular focus to analyze the data with methods least influenced by spurious signals. He was recognized as the Next Generation in Biomedicine by the Broad Institute of MIT and Harvard because of his contributions in dealing with confounding factors with deep learning.