Shufan Ming's Preliminary Exam

PhD candidate Shufan Ming will present her dissertation proposal, "Enhancing Accessibility of Biomedical Literature with Knowledge-Guided Large Language Models." Ming's preliminary examination committee includes Associate Professor Halil Kilicoglu, Chair; Assistant Professor Yue Guo; Professor Bertram Ludascher; and Associate Professor Vetle I. Torvik. 

Abstract

The dissemination of biomedical knowledge is crucial for transforming information in literature into accessible and actionable insights that improve clinical applications, biomedical research, and public health literacy. With the exponential increase in biomedical publications, both researchers and the general public face challenges in processing the vast amount of biomedical literature, leading to information overload. Researchers need efficient methods to extract and synthesize key information for evidence-based decision-making, while non-expert audiences require understandable, layperson-friendly biomedical knowledge to improve health literacy and support informed decision-making.

Nowadays, NLP techniques, particularly through the use of large language models (LLMs), has emerged as a promising tool for biomedical knowledge synthesis. However, LLMs face several limitations; they often lack true semantic understanding, struggle with domain-specific reasoning, and exhibit issues such as hallucinations and limited interpretability. These models primarily rely on statistical co-occurrence patterns in massive text corpora, which can lead to a shallow understanding of domain-specific biomedical knowledge.

Biomedical domain knowledge is highly structured and semantically rich resources such as the Unified Medical Language System (UMLS), Medical Subject Headings (MeSH), and other knowledge sources explicitly define biomedical concepts and their relationships. These structured resources can complement the statistical learning mechanisms of LLMs, creating a synergistic framework that can enhance the accuracy, interpretability, and usability of biomedical NLP applications.

This dissertation explores how integrating structured biomedical knowledge, such as ontologies and controlled vocabularies, into transformer-based language models can enhance prediction performance, interpretability, and reliability in biomedical NLP applications. Specifically, it focuses on three key downstream tasks: extracting relationships between entities from biomedical literature to support biomedical research (e.g., literature-based discovery), summarizing biomedical abstracts into layperson-friendly summaries to improve public health literacy, and classifying biomedical publication types to improve automatic literature indexing and support evidence synthesis for clinical decision-making.

In this dissertation, I explore the synergy between LLMs and structured biomedical knowledge by leveraging the complementary strengths of both symbolic and neural methods. Ultimately, this research aims to enhance the dissemination of biomedical knowledge for both researchers and lay audiences.

Question? Contact Shufan Ming.