Diesner and Mishra publish paper on NER tool for social media research

Twitter logo
Jana Diesner
Jana Diesner, Associate Professor
Shubhanshu Mishra
Shubhanshu Mishra

The identification of proper names of people, organizations, and locations from raw texts, referred to as Named Entity Recognition (NER), can be highly accurate when researchers use NER tools on a large collection of text with proper syntax. However, using existing NER tools for analyzing social media text can lead to poor identification of named entities. In particular, Twitter text frequently includes inconsistent capitalization, spelling errors, and shortened versions of words.

TwitterNER, an open-source tool developed by doctoral student Shubhanshu Mishra, who is supervised by Assistant Professor Jana Diesner, can help researchers interested in performing NER on social media text. TwitterNER has recently been shown (in an independent evaluation by Humangeo) to perform better in terms of precision than some other publicly available systems for entity types of person, location, and organization, which are often of most interest to researchers.

"Our system relies on a combination of hand-engineered features," explained Mishra. "It follows the paradigm of transductive semi-supervised learning where all the labeled and unlabeled data is utilized to make predictions about the unlabeled data."

The original implementation of TwitterNER was created for the shared-task session at the 2016 Conference on Computational Linguistics (COLING) Workshop on "Noisy User-generated Text" (W-NUT). Workshop participants were asked to build an NER system for Twitter data, which was evaluated using a common test dataset. TwitterNER had a high level of precision among the various systems.

Diesner and Mishra then improved their approach and shared it with W-NUT by submitting the paper, "Semi-supervised Named Entity Recognition in noisy-text."

"Our original submission ranked seventh in the task, but our final improved version surpassed the second-best performing system on the concluded task," said Mishra. "The winning system was based on deep learning, but its implementation is not publicly available."

Mishra has an integrated MS and BS in mathematics and computing from the Indian Institute of Technology Kharagpur. He is interested in the analysis of information generation in social networks such as those in scholarly data and social media websites. His prior projects have included systems for user sentiment profiling, active learning using human-in-the-loop design pattern, and novelty profiling in scholarly data.

Diesner is an expert in human-centered computing, network science, natural language processing, and machine learning. Recognition for her research expertise include appointments as CIO Scholar for Information Research & Technology at Illinois (2018), faculty fellow at the National Center for Supercomputing Applications (NCSA) at Illinois (2015), and as a research fellow in the Dori J. Maynard Senior Research Fellows program through The Center for Investigative Reporting and The Robert C. Maynard Institute for Journalism Education (2016). She holds a PhD from the Computation, Organizations and Society (COS) program at Carnegie Mellon University's School of Computer Science.

Updated on
Backto the news archive

Related News

Spectrum Scholar Spotlight: Kyra Lee

A record seventeen iSchool master's students were named 2020-2021 Spectrum Scholars by the American Library Association (ALA) Office for Diversity, Literacy, and Outreach Services. This "Spectrum Scholar Spotlight" series highlights the School's scholars. MS/LIS student Kyra Lee earned her BA in creative writing with a minor in business administration from Southern Illinois University.

Kyra Lee

Naiman receives Fiddler Faculty Fellowship

Teaching Assistant Professor Jill Naiman has received a Fiddler Innovation Faculty Fellowship from the National Center for Supercomputing Applications (NCSA). The fellowship is part of a $2 million endowment from Jerry Fiddler and Melissa Alden to the University of Illinois in support of the Emerging Digital Research and Education in Arts Media (eDream) Institute at NCSA.

Jill Naiman

Get to know Hadley So, BS student

Hadley So, a freshman from the San Francisco Bay area, is interested in ethically harnessing technology's potential to help society and the world. According to So, the iSchool classes he has taken so far have helped him analyze problems in new ways, and his professors' wide range of backgrounds and perspectives "keep the lessons interesting and relevant to modern issues."

Hadley So

The Center for Children’s Books celebrates 75 years

A crossroads for critical inquiry, professional training, and educational outreach, the iSchool’s Center for Children’s Books (CCB) is celebrating its 75th anniversary this year. In its dual role as research collection and educational community, the Center has a national impact on the future of reading and readers. The CCB supports its mission by providing space, staff, and other support to affiliates; housing collections and other research tools; and sponsoring outreach, scholarly conferences, and instructional activities. Affiliates include School and University faculty and academic staff, The Bulletin of the Center for Children's Books, and the iSchool’s School Librarian Licensure Program.

CCB 75th anniversary logo

Sanfilippo to discuss cooperative organizations and technology at TPRC

Assistant Professor Madelyn Sanfilippo will present her research at The Research Conference on Communications, Information, and Internet Policy (TPRC), which will be held virtually on February 17-19. TPRC's mission is to promote "interdisciplinary thinking on current and emerging issues in communications and the Internet by disseminating and discussing new research relevant to policy questions in the U.S. and around the world."

Madelyn Sanfilippo