Diesner and Mishra publish paper on NER tool for social media research

Twitter logo
Jana Diesner
Jana Diesner, Associate Professor and PhD Program Director
Shubhanshu Mishra
Shubhanshu Mishra

The identification of proper names of people, organizations, and locations from raw texts, referred to as Named Entity Recognition (NER), can be highly accurate when researchers use NER tools on a large collection of text with proper syntax. However, using existing NER tools for analyzing social media text can lead to poor identification of named entities. In particular, Twitter text frequently includes inconsistent capitalization, spelling errors, and shortened versions of words.

TwitterNER, an open-source tool developed by doctoral student Shubhanshu Mishra, who is supervised by Assistant Professor Jana Diesner, can help researchers interested in performing NER on social media text. TwitterNER has recently been shown (in an independent evaluation by Humangeo) to perform better in terms of precision than some other publicly available systems for entity types of person, location, and organization, which are often of most interest to researchers.

"Our system relies on a combination of hand-engineered features," explained Mishra. "It follows the paradigm of transductive semi-supervised learning where all the labeled and unlabeled data is utilized to make predictions about the unlabeled data."

The original implementation of TwitterNER was created for the shared-task session at the 2016 Conference on Computational Linguistics (COLING) Workshop on "Noisy User-generated Text" (W-NUT). Workshop participants were asked to build an NER system for Twitter data, which was evaluated using a common test dataset. TwitterNER had a high level of precision among the various systems.

Diesner and Mishra then improved their approach and shared it with W-NUT by submitting the paper, "Semi-supervised Named Entity Recognition in noisy-text."

"Our original submission ranked seventh in the task, but our final improved version surpassed the second-best performing system on the concluded task," said Mishra. "The winning system was based on deep learning, but its implementation is not publicly available."

Mishra has an integrated MS and BS in mathematics and computing from the Indian Institute of Technology Kharagpur. He is interested in the analysis of information generation in social networks such as those in scholarly data and social media websites. His prior projects have included systems for user sentiment profiling, active learning using human-in-the-loop design pattern, and novelty profiling in scholarly data.

Diesner is an expert in human-centered computing, network science, natural language processing, and machine learning. Recognition for her research expertise include appointments as CIO Scholar for Information Research & Technology at Illinois (2018), faculty fellow at the National Center for Supercomputing Applications (NCSA) at Illinois (2015), and as a research fellow in the Dori J. Maynard Senior Research Fellows program through The Center for Investigative Reporting and The Robert C. Maynard Institute for Journalism Education (2016). She holds a PhD from the Computation, Organizations and Society (COS) program at Carnegie Mellon University's School of Computer Science.

Updated on
Backto the news archive

Related News

Knox and Pintar named CITL Faculty Fellows

Associate Professor and BS/IS Program Director Emily Knox and Teaching Associate Professor Judith Pintar have been named Center for Innovation in Teaching & Learning (CITL) Faculty Fellows. This new program, which is supported by the Office of the Provost, "draws upon Fellows' unique perspectives and academic interests to assist CITL staff in identifying faculty needs and reducing barriers to participation in CITL programs, workshops, and events."

Visualizing how Americans spend their time earns Parmar first place in competition

MS/IM student Pranay Parmar won first place in the Data Visualization Competition sponsored by the University of Illinois Library's Scholarly Commons. The competition provides students with an opportunity to demonstrate their skills in visually communicating information. Winners were announced on October 22 at the 2019 Scholarly Commons Open House.

Pranay Parmar

Underwood to present lecture as Visiting Scaliger Professor

Professor Ted Underwood will give a lecture at Leiden University in the Netherlands as the Visiting Scaliger Professor for 2019. The position is affiliated with both the Scaliger Institute of Leiden University Libraries and the Faculty of Humanities. In his talk on November 21, "The Role of the Humanities in an Information Age," Underwood will discuss how "humanists are joining hands with data science to create a form of public reflection that fuses the scale of machine learning with the historical self-consciousness of humanistic tradition."

Ted Underwood

Stodden to present reproducibility research at two distinguished lectures

Associate Professor Victoria Stodden will give distinguished lectures at the University of Chicago on November 19 and Northwestern University on November 20. These lectures will focus on her reproducibility research as well as her work as a member of the U.S. National Academies of Sciences, Engineering, and Medicine (NASEM) Committee on Reproducibility and Replicability.

Victoria Stodden

Huang presents social computing, AI research at CSCW 2019

Assistant Professor Yun Huang presented her research at the 22nd ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2019), which was held November 9-13 in Austin, Texas. CSCW is the premier venue for experts from industry and academia to explore the technical, social, material, and theoretical challenges of designing technology to support collaborative work and life activities.

Yun Huang