Several GSLIS affiliates presented this week at HASTAC 2015, the annual conference of the Humanities, Arts, Sciences, and Technology Alliance and Collaboratory. Held at Michigan State University May 28-29, HASTAC 2015 featured presentations relating to the theme, "Art and Science of Digital Humanities."
GSLIS postdoctoral research associate Sayan Bhattacharyya presented an interactive session titled, "Workshop on Text Analytics with the HathiTrust Research Center: An Introduction to Tools for Working with Digitized Text and Metadata."
From the abstract: This workshop is intended for a broad audience ranging from curious graduate students exploring digital humanities to the experienced text mining researcher. The workshop will provide a hands-on introduction to the HathiTrust Digital Libraru collection and its metadata, and to the tools and functionalities developed by the HTRC that leverage these resources. Through the concrete instances of the HTRC tools, the workshop will orient attendees about the new challenges and opportunities that the ability to carry out algorithmic text analysis at such a large scale presents to researchers. The workshop will cover the Secure Hathi Analytics Research Commons (SHARC), the HathiTrust+Bookworm (HT+BW) tool and the HTRC Extracted Features Dataset. Attendees will be shown how to build their own worksets (small, customized subcorpora from the HathiTrust Digital Library corpus) and how to conduct analyses on worksets. There will also be group discussion involving all attendees about the emerging questions that these novel developments are likely to inaugurate in their own fields and about how these developments can affirm or disrupt (or both affirm and disrupt simultaneously) established practices of inquiry.
Craig Evans, a doctoral student in informatics advised by GSLIS Professor Michael Twidale, presented research conducted with Associate Professor Cathy Blake and Assistant Professor Jana Diesner. His presentation was titled, "Email Data Analysis as an Alternate Lens into Historical Events," in which he discussed research into challenges related to analysis of email data.
From the abstract: One particular challenge with email data specifically that also relates to other types of authorship data and social media data is the mapping of email addresses to individuals. This matters because a) many people use more than one email address and b) email addresses might refer to actual individuals versus larger collectives.
Using the Enron email data, which entail about 400K of emails over a range of more than three years, we show how data provenance techniques...have a large impact on the insights we gain from analyzing these data. We do this by showing the differences in substantive knowledge gained about the social dynamics in this organization that are due to various data consolidation techniques instead of actual social dynamics. We will also provide an approximation of the “true” picture of these dynamics as reflected in the underlying data based on associating email addresses with actual people as much as possible.
We show how exploiting information from email headers to build time-stamped explicit social networks can be combined with analyzing the content of text bodies through natural language processing techniques to better understand the flow of knowledge and information in a social system.
GSLIS master’s student Rezvaneh Rezapour gave a presentation titled, "Computational Impact Assessment of Documentaries and Related Media," in which she discussed ongoing research she is conducting Assistant Professor Jana Diesner.
From the abstract: We present our work from developing and applying a theoretically grounded, empirical, and computational methodology for assessing and comparing the impact of information products in a systematic and rigorous fashion. This work started with assessing the impact of social justice documentaries. Unlike media products whose impact can be measured in metrics like ticket sales or numbers of viewers, social justice documentaries pose a particular challenge because their aim is to create some type of social change. We have developed a theoretical framework and pertinent technology that enables people to a) collect data from a variety of sources, including media and social media, b) construct a baseline model of key stakeholders and their opinions associated with the main issues addressed in a documentary, c) track changes in the baseline over time, and d) identify which changes might be attributable to the content of a documentary (ground truth model) and/or its coverage in (social) media. We will give a brief overview on this process and discuss in more detail how this work has been used by filmmakers.