The highly technical language used in biomedical publications makes it difficult for nonexpert audiences to fully understand their content and draw insights. The BioLaySumm competition focuses on making biomedical research publications more accessible to lay audiences. This year, the winning team was a group from Associate Professor Halil Kilicoglu's research lab: PhD students Zhiwen (Jerome) You and Shufan Ming and Computer Science master's student Shruthan Radhakrishna. Their work was presented at the 62nd Annual Meeting of the Association for Computational Linguistics.
For the competition, the organizers provided teams with summarization datasets, which had human-generated plain language summaries for a set of biomedical publications. These datasets allowed the development and evaluation of novel, state-of-the-art natural language processing (NLP) methods. The generated summaries were evaluated on their relevance, readability, and faithfulness to original publication. Fifty-three teams from around the world participated in the 2024 competition, and Kilicoglu's team ranked first overall and first in relevance metric.
According to Kilicoglu, his research group first identified the most relevant sentences from the full text of a biomedical publication and then fine-tuned a large language model using the title, abstract, and the extracted sentences. "We obtained the best results with this approach," he said. "Alternatively, we also fine-tuned a smaller language model (Longformer) and incorporated general knowledge from Wikipedia to inform the plain language summaries. This approach was less efficient but still one of the best models reported."
Kilicoglu's research interests include biomedical informatics, natural language processing, knowledge representation, scholarly communication, and scientific reproducibility. He holds a PhD in computer science from Concordia University.