School of Information Sciences

Diesner and Mishra publish paper on NER tool for social media research

Twitter logo
Jana Diesner
Jana Diesner, Affiliate Associate Professor

The identification of proper names of people, organizations, and locations from raw texts, referred to as Named Entity Recognition (NER), can be highly accurate when researchers use NER tools on a large collection of text with proper syntax. However, using existing NER tools for analyzing social media text can lead to poor identification of named entities. In particular, Twitter text frequently includes inconsistent capitalization, spelling errors, and shortened versions of words.

TwitterNER, an open-source tool developed by doctoral student Shubhanshu Mishra, who is supervised by Assistant Professor Jana Diesner, can help researchers interested in performing NER on social media text. TwitterNER has recently been shown (in an independent evaluation by Humangeo) to perform better in terms of precision than some other publicly available systems for entity types of person, location, and organization, which are often of most interest to researchers.

"Our system relies on a combination of hand-engineered features," explained Mishra. "It follows the paradigm of transductive semi-supervised learning where all the labeled and unlabeled data is utilized to make predictions about the unlabeled data."

The original implementation of TwitterNER was created for the shared-task session at the 2016 Conference on Computational Linguistics (COLING) Workshop on "Noisy User-generated Text" (W-NUT). Workshop participants were asked to build an NER system for Twitter data, which was evaluated using a common test dataset. TwitterNER had a high level of precision among the various systems.

Diesner and Mishra then improved their approach and shared it with W-NUT by submitting the paper, "Semi-supervised Named Entity Recognition in noisy-text."

"Our original submission ranked seventh in the task, but our final improved version surpassed the second-best performing system on the concluded task," said Mishra. "The winning system was based on deep learning, but its implementation is not publicly available."

Mishra has an integrated MS and BS in mathematics and computing from the Indian Institute of Technology Kharagpur. He is interested in the analysis of information generation in social networks such as those in scholarly data and social media websites. His prior projects have included systems for user sentiment profiling, active learning using human-in-the-loop design pattern, and novelty profiling in scholarly data.

Diesner is an expert in human-centered computing, network science, natural language processing, and machine learning. Recognition for her research expertise include appointments as CIO Scholar for Information Research & Technology at Illinois (2018), faculty fellow at the National Center for Supercomputing Applications (NCSA) at Illinois (2015), and as a research fellow in the Dori J. Maynard Senior Research Fellows program through The Center for Investigative Reporting and The Robert C. Maynard Institute for Journalism Education (2016). She holds a PhD from the Computation, Organizations and Society (COS) program at Carnegie Mellon University's School of Computer Science.

Updated on
Backto the news archive

Related News

Vaez Afshar named APT Student Scholar

Informatics PhD student Sepehr Vaez Afshar has been named a Student Scholar by the Association for Preservation Technology (APT). Each year, around ten students are selected worldwide for the scholarship program based on the quality and innovation of their research abstracts, as well as their contribution to the field of preservation technology. Scholars are paired with mentors from the APT College of Fellows, prepare and present their research during the association's annual conference, and enjoy opportunities for long-term professional networking and mentorship within the preservation community.

Sepehr Vaez Afshar

iSchool well represented at ASIS&T 2025

iSchool faculty, staff, and students will participate in the 88th Annual Meeting of the Association for Information Science and Technology (ASIS&T), which will be held on November 14-18 in Arlington, Virginia. ASIS&T will also host a Virtual Satellite Meeting on December 11-12. 

Kang makes sense of too much information

As an MSIM student at the iSchool, Zhanchen Kang is passionate about helping people make sense of the overwhelming amount of information in their daily lives. Kang earned an undergraduate degree in information systems in China before coming to the University of Illinois to further explore how technology, data, and people intersect. 

Zhanchen Kang

Students from The Stu/dio to present work at MDEV

Students from The Stu/dio, the University of Illinois student-led game production studio, are preparing to take the stage at MDEV 2025, which will be held on November 7-8 in Madison, Wisconsin. One of the Midwest's most popular game industry conferences, MDEV celebrates innovation and collaboration in game development by bringing together game designers, developers, and enthusiasts from across the region for panels, workshops, and networking. 

PhD students receive scholarships from IAPP

Information Sciences PhD students Mubarak Raji, Eryclis Rodrigues Silva, and Eryue Xu, and Informatics PhD student Muhammad Hussain have received A. Serwin Conference Scholarships from the International Association of Privacy Professionals (IAPP). The award, which recognizes outstanding students in the areas of privacy, AI governance, and digital responsibility, consists of $1,000 and complimentary conference registration. The IAPP’s annual conference, Privacy. Security. Risk., will be held October 30-31 in San Diego, California.

School of Information Sciences

501 E. Daniel St.

MC-493

Champaign, IL

61820-6211

Voice: (217) 333-3280

Fax: (217) 244-3302

Email: ischool@illinois.edu

Back to top