Publication Date
2016
Document Type
Dissertation
Committee Members
Keke Chen (Committee Member), Justin Martineau (Committee Member), Amit Sheth (Advisor), Krishnaprasad Thirunarayan (Committee Member), Ingmar Weber (Committee Member)
Degree Name
Doctor of Philosophy (PhD)
Abstract
Web 2.0 and social media enable people to create, share and discover information instantly anywhere, anytime. A great amount of this information is subjective information -- the information about people's subjective experiences, ranging from feelings of what is happening in our daily lives to opinions on a wide variety of topics. Subjective information is useful to individuals, businesses, and government agencies to support decision making in areas such as product purchase, marketing strategy, and policy making. However, much useful subjective information is buried in ever-growing user generated data on social media platforms, it is still difficult to extract high quality subjective information and make full use of it with current technologies. Current subjectivity and sentiment analysis research has largely focused on classifying the text polarity -- whether the expressed opinion regarding a specific topic in a given text is positive, negative, or neutral. This narrow definition does not take into account the other types of subjective information such as emotion, intent, and preference, which may prevent their exploitation from reaching their full potential. This dissertation extends the definition and introduces a unified framework for mining and analyzing diverse types of subjective information. We have identified four components of a subjective experience: an individual who holds it, a target that elicits it (e.g., a movie, or an event), a set of expressions that describe it (e.g., "excellent", "exciting"), and a classification or assessment that characterize it (e.g., positive vs. negative). Accordingly, this dissertation makes contributions in developing novel and general techniques for the tasks of identifying and extracting these components. We first explore the task of extracting sentiment expressions from social media posts. We propose an optimization-based approach that extracts a diverse set of sentiment-bearing expressions, including formal and slang words/phrases, for a given target from an unlabeled corpus. Instead of associating the overall sentiment with a given text, this method assesses the more fine-grained target-dependent polarity of each sentiment expression. Unlike pattern-based approaches which often fail to capture the diversity of sentiment expressions due to the informal nature of language usage and writing style in social media posts, the proposed approach is capable of identifying sentiment phrases of different lengths and slang expressions including abbreviations and spelling variations. Unlike supervised approaches which require data annotation when applied to a new domain, the proposed approach is unsupervised and thus is highly portable to new domains. We then look into the task of finding opinion targets in product reviews, where the product features (product attributes and components) are usually the targets of opinions. We propose a clustering approach that identifies product features and groups them into aspect categories. Unlike many existing approaches that first extract features and then group them into categories, the proposed approach identifies features and clusters them into aspects simultaneously. In addition, prior work on feature extraction tends to require seed terms and focuses on identifying explicit features, while the proposed approach extracts both explicit and implicit features and does not require seed terms. Finally, we study the classification and assessment of several types of subjective information (e.g., sentiment, political preference, subjective well-being) in two specific application scenarios. One application is to predict election results based on analyzing the sentiments of social media users towards election candidates. Observing that different political preference and tweeting behavior of users may have significant effect on predicting election results. We pr...
Page Count
161
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2016
Copyright
Copyright 2016, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
ORCID ID
http://orcid.org/0000-0002-5497-4690