Yingjun Guan’s Dissertation Defense

PhD candidate Yingjun Guan will present his dissertation proposal, "Disambiguating Academic Institution Names: A Comprehensive Study of Authority Files, Linguistic Variations, and Computational Evaluation in PubMed Affiliations." Guan's dissertation committee includes Associate Professor Vetle Torvik, Professor Stephen Downie, Professor Bertram Ludäscher, and Professor Allen Renear.

Abstract

This dissertation investigates the challenges of institutional name disambiguation (IND) in scholarly communication, focusing on the inconsistencies and ambiguities found in academic affiliation metadata. It examines variations in naming conventions, institutional hierarchies, and multilingual expressions that hinder accurate representation across digital library systems and bibliometric platforms. Through a comparative review of 21 authority files—including VIAF, ROR, and Wikidata—a new integrated authority dataset is developed to improve standardization.

The study further introduces a manually annotated dataset of PubMed affiliation records to analyze linguistic patterns, synonym usage, and structural inconsistencies in real-world data. It evaluates the coverage and performance of major authority files and computational tools using precision, recall, and other core metrics. The findings include an organized framework that categorizes the different types of linguistic ambiguities in institutional names, a benchmark dataset for future research, and practical insights into combining authority control with computational methods. Together, these efforts support more reliable affiliation parsing and enhance data integrity in bibliometrics, citation indexing, and digital scholarly infrastructures.

Question? Contact Yingjun Guan