Derek Doran (Committee Member), Amit Sheth (Advisor), Krishnaprasad Thirunarayan (Committee Member)
Master of Science (MS)
As the popularity of online social networking sites such as Twitter and Facebook continues to rise, the volume of textual content generated on the web is increasing rapidly. The mining of user generated content in social media has proven effective in domains ranging from personalization and recommendation systems to crisis management. These applications stand to be further enhanced by incorporating information about the geo-position of social media users in their analysis. Due to privacy concerns, users are largely reluctant to share their location information. As a consequence of this, researchers have focused on automatic inferencing of location information from the contents of a user's tweets. Existing approaches are purely data-driven and require large training data sets of geotagged tweets. Furthermore, these approaches rely solely on social media features or probabilistic language models and fail to capture the underlying semantics of the tweets. In this thesis, we propose a novel knowledge based approach that does not require any training data. Our approach uses Wikipedia, a crowd sourced knowledge base, to extract entities that are relevant to a location. We refer to these entities as local entities. Additionally, we score the relevance of each local entity with respect to the city, using the Wikipedia Hyperlink Graph. We predict the most likely location of the user by matching the scored entities of a city and the entities mentioned by users in their tweets. We evaluate our approach on a publicly available data set consisting of 5119 Twitter users across continental United States and show comparable accuracy to the state-of-the-art approaches. Our results demonstrate the ability to pinpoint the location of a Twitter user to a state and a city using Wikipedia, without needing to train a probabilistic model.
Department or Program
Department of Computer Science
Year Degree Awarded
Copyright 2015, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.