Publication Date


Document Type


Committee Members

John Flach (Committee Member), Daniel Gruhl (Committee Member), Kevin Haas (Committee Member), Michael Raymer (Committee Member), Amit Sheth (Committee Chair), Shaojun Wang (Committee Member)

Degree Name

Doctor of Philosophy (PhD)


Over the last few years, there has been a growing public and enterprise fascination with 'social media' and its role in modern society. At the heart of this fascination is the ability for users to participate, collaborate, consume, create and share content via a variety of platforms such as blogs, micro-blogs, email, instant messaging services, social network services, collaborative wikis, social bookmarking sites, and multimedia sharing sites.

This dissertation is devoted to understanding informal user-generated textual content on social media platforms and using the results of the analysis to build Social Intelligence Applications.

The body of research presented in this thesis focuses on understanding what a piece of user-generated content is 'About' via two sub-goals of Named Entity Recognition and Key Phrase Extraction on informal text. In light of the poor context and informal nature of content on social media platforms, we investigate the role of contextual information from documents, domain models and the social medium to supplement and improve the reliability and performance of existing text mining algorithms for Named Entity Recognition and Key Phrase Extraction.

In all cases we find that using multiple contextual cues together lends to reliable inter-dependent decisions, better than using the cues in isolation and that such improvements are robust across domains and content of varying characteristics, from micro-blogs like Twitter, social networking forums such as those on MySpace and Facebook, and blogs on the Web.

Finally, we showcase two deployed Social Intelligence applications that build over the results of Named Entity Recognition and Key Phrase Extraction algorithms to provide near real-time information about the pulse of an online populace. Specifically, we describe what it takes to build applications that wish to exploit the 'wisdom of the crowds'- highlighting challenges in data collection, processing informal English text, metadata extraction and presentation of the resulting information.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded