Members of Associate Professor Halil Kilicoglu's research lab won third prize, valued at $20,000, in the LitCoin Natural Language Processing (NLP) Challenge. The challenge, which was sponsored by the National Institutes of Health (NIH) National Center for Advancing Translational Sciences (NCATS), brought together government, medical research communities, and data scientists to create data-driven knowledge graphs that consolidate scientific literature knowledge across domains.
The iSchool student team, composed of PhD student Mengfei Lan and Informatics PhD students Maria Janina Sarol and Haoyang Liu, was challenged with developing an NLP system with the ability to identify the concepts discussed in biomedical publications and relate them to create knowledge graphs for each publication that capture their scientific content.
"We were interested in the LitCoin challenge because it focused on extracting salient knowledge from biomedical publications, which is aligned with the research we do in our lab," said Sarol, the team lead. "Furthermore, the challenge was organized by NIH/NCATS, and they have plans to incorporate the models developed into their workflows, which is likely to increase impact. The fact that they also evaluated the models on the basis of reproducibility was another motivation for our participation, since this is another major research theme in our lab."
For the challenge, the participants used a NIH-developed dataset of published scientific research abstracts and knowledge assertions between concepts within these abstracts. The participants then used the dataset to design and train their NLP models to automatically generate knowledge assertions from the text of abstracts. Submissions were tested using an automated custom evaluator that compared the accuracy of results generated by the participating systems.
According to the NIH/NCATS, the LitCoin Challenge aims to help data scientists better deploy their data-driven technology solutions towards accelerating scientific research in medicine and to leverage data from biomedical publications in order to reach a wide range of biomedical researchers.