A new informatics approach developed by Associate Professor Catherine Blake and Rebecca Kehm, a post-doctoral fellow at Columbia University Mailman School of Public Health, will assist physicians and researchers in their systematic review of medical literature. While previous automated methods to identify outcomes have extracted sentences, Blake and Kehm introduce a more precise method that extracts noun phrases rather than the entire sentence.
A sentence can contain more than one outcome and can include both outcome and non-outcome components, which makes using a noun phrase a better option for literature reviews. The researchers use machine learning to automatically detect new outcomes (endpoints) from the methods section of 88K MEDLINE abstracts, and discovered that 96.7% of the outcomes could be represented as a noun phrase. The results also suggest that structural information about how authors communicate outcomes, in particular by using lists, gave better performance than a machine learning approach.
The resulting paper, "Comparing breast cancer treatments using automatically detected surrogate and clinically relevant outcomes entities from text," was published in the Journal of Biomedical Informatics (vol. 1, March 2019).
"This is an exciting step forward," Blake said. "The increased precision in using noun phrases rather than a sentence enables us to compare different treatment strategies with respect to clinically relevant or surrogate endpoints."
From their review of medical literature related to breast cancer treatments, Blake and Kehm found that the most clinically relevant outcome (overall survival) is not the most frequently reported outcome for all treatments, for example disease-free survival is reported more than overall survival in hormone therapy abstracts.
Blake's research interests include biomedical informatics, natural language processing, evidence-based discovery, learning health systems, socio-technical systems, and data analytics. In addition to her professorial role, she serves the iSchool as associate director of the Center for Informatics Research in Science and Scholarship and as program director of the MS in information management and the MS in bioinformatics. Prior to coming to Illinois, she was a faculty member at the School of Information and Library Science at the University of North Carolina, Chapel Hill, a research scientist, and an applications programmer. Blake holds a PhD and MS in information and computer science from the University of California, Irvine, and an MS and BS in computer science from the University of Wollongong, Australia.