Diesner and Mishra publish paper on NER tool for social media research

Twitter logo
Jana Diesner
Jana Diesner, Associate Professor

The identification of proper names of people, organizations, and locations from raw texts, referred to as Named Entity Recognition (NER), can be highly accurate when researchers use NER tools on a large collection of text with proper syntax. However, using existing NER tools for analyzing social media text can lead to poor identification of named entities. In particular, Twitter text frequently includes inconsistent capitalization, spelling errors, and shortened versions of words.

TwitterNER, an open-source tool developed by doctoral student Shubhanshu Mishra, who is supervised by Assistant Professor Jana Diesner, can help researchers interested in performing NER on social media text. TwitterNER has recently been shown (in an independent evaluation by Humangeo) to perform better in terms of precision than some other publicly available systems for entity types of person, location, and organization, which are often of most interest to researchers.

"Our system relies on a combination of hand-engineered features," explained Mishra. "It follows the paradigm of transductive semi-supervised learning where all the labeled and unlabeled data is utilized to make predictions about the unlabeled data."

The original implementation of TwitterNER was created for the shared-task session at the 2016 Conference on Computational Linguistics (COLING) Workshop on "Noisy User-generated Text" (W-NUT). Workshop participants were asked to build an NER system for Twitter data, which was evaluated using a common test dataset. TwitterNER had a high level of precision among the various systems.

Diesner and Mishra then improved their approach and shared it with W-NUT by submitting the paper, "Semi-supervised Named Entity Recognition in noisy-text."

"Our original submission ranked seventh in the task, but our final improved version surpassed the second-best performing system on the concluded task," said Mishra. "The winning system was based on deep learning, but its implementation is not publicly available."

Mishra has an integrated MS and BS in mathematics and computing from the Indian Institute of Technology Kharagpur. He is interested in the analysis of information generation in social networks such as those in scholarly data and social media websites. His prior projects have included systems for user sentiment profiling, active learning using human-in-the-loop design pattern, and novelty profiling in scholarly data.

Diesner is an expert in human-centered computing, network science, natural language processing, and machine learning. Recognition for her research expertise include appointments as CIO Scholar for Information Research & Technology at Illinois (2018), faculty fellow at the National Center for Supercomputing Applications (NCSA) at Illinois (2015), and as a research fellow in the Dori J. Maynard Senior Research Fellows program through The Center for Investigative Reporting and The Robert C. Maynard Institute for Journalism Education (2016). She holds a PhD from the Computation, Organizations and Society (COS) program at Carnegie Mellon University's School of Computer Science.

Updated on
Backto the news archive

Related News

Get to know Andres Perez, MS/IM student

Andres Perez is preparing for a career in cybersecurity through a combination of the iSchool's MS in information management (MS/IM) program and the Illinois Cyber Security Scholars Program (ICSSP), a CyberCorps Scholarship for Service program funded by the National Science Foundation. Perez applied for the ICSSP—which provides full tuition, a stipend, and development opportunities for students who want to specialize in cybersecurity and privacy—to "grow as a professional and contribute to a greater mission."

Andres Perez

iSchool researchers discuss misinformation

Several iSchool researchers participated in the recent Misinformation Research Symposium, which was hosted by the Center for Social and Behavioral Science and sponsored by the Center for Advanced Study, Interdisciplinary Health Sciences Institute, and National Center for Supercomputing Applications. The goals of the symposium were to help connect misinformation research on campus, foster interdisciplinary teams interested in collaborating on external submissions, and learn more about the needs of existing and emerging research groups on campus. 

Black and Knox pen chapters for new handbook on information policy

A new book on information policy includes chapters by Professor Emeritus Alistair Black and Associate Professor and Interim Associate Dean for Academic Affairs Emily Knox. Research Handbook on Information Policy, edited by Alistair S. Duff, was recently published by Edward Elgar Publishing. The handbook covers topics such as the history and future of information policy, freedom of information and expression, intellectual property, and information inequality.

research handbook on information policy

Disciplining Data: A conversation with a school of information sciences dean

Eunice Santos, professor and dean of the School of Information Sciences at the University of Illinois Urbana-Champaign, recently sat down with David B. Wilkins, faculty director of the Harvard Law School Center on the Legal Profession, for a conversation about the intersection of information sciences and the law, and how to train students to be effective collaborators and translators between the disciplines.

Eunice Santos

Maemura to join iSchool faculty

The iSchool is pleased to announce that Emily Maemura will join the faculty as an assistant professor in January 2022. She recently completed her PhD at the University of Toronto's Faculty of Information, with a dissertation exploring the practices of collecting and curating web pages and websites for future use by researchers in the social sciences and humanities.

Emily Maemura