Doctoral candidate Jenna Kim successfully defended her dissertation, "Evaluating Pre-Trained Language Modeling Approaches for Author Name Disambiguation," on June 11, 2024.
Her committee included Jana Diesner (chair), affiliate associate professor in the iSchool and professor at Technical University of Munich; Professor Bertram Ludäscher; Associate Professor Vetle Ingvald Torvik; and Assistant Professor Haohan Wang.
Abstract: Distinguishing between authors who share the same names or identifying instances where different names refer to the same individual remains a persistent challenge in bibliometric research. This complexity impedes accurate cataloging and indexing in digital libraries, affecting the integrity of academic databases and the reliability of scholarship evaluation based on bibliographic data. Although various machine learning methods have been explored to tackle the issue of author name disambiguation (AND), traditional methods often fail to capture the subtle linguistic and contextual nuances essential for effective disambiguation. This dissertation delves into applying pre-trained language models for AND within scholarly databases and identifying its potential and limitations compared to traditional machine learning approaches. This is a novel endeavor for improving the accuracy and functionality of digital library systems and bibliometric assessments. The findings confirm that pre-trained language models significantly outperform traditional approaches, demonstrating their ability to handle complex linguistic patterns and contextual cues vital for accurately differentiating between authors with similar names. Incorporating abstract text features boosts model performance, highlighting the critical role of semantic context in AND tasks.
