Academic institutions are a cornerstone of the scientific ecosystem, as they provide the primary means of employment for the majority of researchers worldwide and contribute to the generation and dissemination of scientific knowledge. For such aggregation to be successful, the names of the institutions (at the campus level) need to be accurately reflected, and a canonical name (normalized name) needs to be used consistently. However, it is expected that institution names that are used in scientific communication (scholarly communication) are not canonical forms. For example, our campus is often referred to as the University of Illinois at Urbana-Champaign, as well as UIUC.
For this purpose, this dissertation aims to study academic institution name recognition (INR) and discuss what makes INR challenging. The work makes contributions in the following aspects: (1) study the existing authority files for the academic institution names (both the canonical names and the synonym names), compare and integrate to generate a new authority file matching the purpose of this dissertation; (2) utilize PubMed affiliation metadata as an experimental study to study the main reasons that lead to the institution name variation; (3) leverage the computational models to automate the progress of INR and disclose the prevalence of the difficulties with appropriate model evaluation and performance discussion.
Yingjun Guan's committee includes Associate Professor Vetle Torvik (chair), Associate Professor Halil Kilicoglu (co-chair), Professor Bertram Ludäscher, and Professor Allen Renear
Questions? Contact Yingjun Guan.