Publication Date

2015

Document Type

Thesis

Committee Members

Derek Doran (Committee Member), Amit Sheth (Advisor), Krishnaprasad Thirunarayan (Committee Member)

Degree Name

Master of Science (MS)

Abstract

As the popularity of online social networking sites such as Twitter and Facebook continues to rise, the volume of textual content generated on the web is increasing rapidly. The mining of user generated content in social media has proven effective in domains ranging from personalization and recommendation systems to crisis management. These applications stand to be further enhanced by incorporating information about the geo-position of social media users in their analysis. Due to privacy concerns, users are largely reluctant to share their location information. As a consequence of this, researchers have focused on automatic inferencing of location information from the contents of a user's tweets. Existing approaches are purely data-driven and require large training data sets of geotagged tweets. Furthermore, these approaches rely solely on social media features or probabilistic language models and fail to capture the underlying semantics of the tweets. In this thesis, we propose a novel knowledge based approach that does not require any training data. Our approach uses Wikipedia, a crowd sourced knowledge base, to extract entities that are relevant to a location. We refer to these entities as local entities. Additionally, we score the relevance of each local entity with respect to the city, using the Wikipedia Hyperlink Graph. We predict the most likely location of the user by matching the scored entities of a city and the entities mentioned by users in their tweets. We evaluate our approach on a publicly available data set consisting of 5119 Twitter users across continental United States and show comparable accuracy to the state-of-the-art approaches. Our results demonstrate the ability to pinpoint the location of a Twitter user to a state and a city using Wikipedia, without needing to train a probabilistic model.

Page Count

69

Department or Program

Department of Computer Science

Year Degree Awarded

2015

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.


Share

COinS