Liri Fang's Preliminary Exam
PhD student Liri Fang will present her proposal defense, "Modeling of Graph and Text Data: Applications in Citation Networks, Taxonomy Expansion, and Claim Verification.
Abstract
In the digital era, we are confronted with massive amounts of unstructured data, like textual documents, in all areas of scientific inquiry. For example, PubMed Central provided 9.4 million open-access articles as of 2023. The textual information is enriched with relationships, like scientific documents, connected with citing and cited connections, coauthorship, and semantic entity co-mention relationships. Effectively discovering and leveraging structure from this abundance presents a significant challenge and a compelling opportunity in scientific literature, healthcare records, financial reports, social media content, and commercial platforms. This dissertation proposal aims to explore the potential of utilizing machine learning to analyze (semi-)structured data (graphs) and unstructured data (text) through the following three applications: 1) improving the quality and reliability of citation networks by denoising and integrating data from nine citation sources, 2) jointly learning the representation of textual and structural information for taxonomy completion, and 3) learning the hidden structural information for language understanding in claim verification.
Large-scale graphs are ubiquitous in real-world applications. The proposed three applications are spread across several scenarios where the graph structure is noisy, incomplete, and hidden. The first application addresses these challenges by integrating citation graph data for PubMed to identify and rectify unreliable citations. The second application focuses on expanding taxonomies with the incoming concepts by joint learning representations of rich textual semantics and graph structures. The third application delves into leveraging latent structural relationships embedded in the text to refine evidence-based claim verification. Together, these efforts aim to demonstrate how structured information modeling can improve language modeling applications.