Tanvi Banerjee (Committee Member), Amit P. Sheth (Advisor), Krishnaprasad Thirunarayan (Committee Member)
Master of Science (MS)
Social networking sites like Twitter and Facebook have become a significant source of user-generated content in the past decade. Mining of this user-generated content has proved beneficial for a broad range of applications like Event Extraction, Document Retrieval, and Sentiment Analysis. Identifying entities is one of the major tasks that fuel important information for above tasks. Identification of entities is typically performed in two steps; Named Entity Recognition (NER) and Entity Linking. State of the art NER solutions focus on recognizing the entities that are mentioned explicitly in social media posts. However, entities are frequently mentioned implicitly in them. For example, the tweet 'Didn't know that its the same actress in Fault in our stars and Divergent.' contains explicit references to movies Fault in our stars and Divergent while it implicitly refers to actress Shailene Woodley. Spotting and classifying tweets with such implicit entity mentions (i.e. recognize that above tweet has implicit entity of type ACTRESS) is the initial step towards identifying the implicit mention of Shailene Woodley in this tweet.
In this thesis, we propose a two step semantic driven approach to address the spotting and typing of implicit entity mentions in text. Specifically, we answer two research questions in this thesis: 1. How to find tweets that have implicit entity mentions of a given type? 2. What features help to distinguish tweets with implicit entity mentions from tweets with explicit entity mentions and tweets with no entity mentions at all? We answer the first question by developing a technique to find semantic cues that indicate the presence of implicit entity mentions in tweets. The second research question is answered by exploiting the syntactic features of the tweets, along with semantic features extracted from crowd-sourced knowledge bases like Wikipedia and DBpedia, to determine whether a tweet has an implicit entity mention or not. We evaluate our approach by creating a gold standard dataset for two domains namely movies and books.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.