Publication Date

2016

Document Type

Thesis

Committee Members

Tanvi Banerjee (Committee Member), Amit P. Sheth (Advisor), Krishnaprasad Thirunarayan (Committee Member)

Degree Name

Master of Science (MS)

Abstract

Social networking sites like Twitter and Facebook have become a significant source of user-generated content in the past decade. Mining of this user-generated content has proved beneficial for a broad range of applications like Event Extraction, Document Retrieval, and Sentiment Analysis. Identifying entities is one of the major tasks that fuel important information for above tasks. Identification of entities is typically performed in two steps; Named Entity Recognition (NER) and Entity Linking. State of the art NER solutions focus on recognizing the entities that are mentioned explicitly in social media posts. However, entities are frequently mentioned implicitly in them. For example, the tweet 'Didn't know that its the same actress in Fault in our stars and Divergent.' contains explicit references to movies Fault in our stars and Divergent while it implicitly refers to actress Shailene Woodley. Spotting and classifying tweets with such implicit entity mentions (i.e. recognize that above tweet has implicit entity of type ACTRESS) is the initial step towards identifying the implicit mention of Shailene Woodley in this tweet.

In this thesis, we propose a two step semantic driven approach to address the spotting and typing of implicit entity mentions in text. Specifically, we answer two research questions in this thesis: 1. How to find tweets that have implicit entity mentions of a given type? 2. What features help to distinguish tweets with implicit entity mentions from tweets with explicit entity mentions and tweets with no entity mentions at all? We answer the first question by developing a technique to find semantic cues that indicate the presence of implicit entity mentions in tweets. The second research question is answered by exploiting the syntactic features of the tweets, along with semantic features extracted from crowd-sourced knowledge bases like Wikipedia and DBpedia, to determine whether a tweet has an implicit entity mention or not. We evaluate our approach by creating a gold standard dataset for two domains namely movies and books.

Page Count

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2016

Copyright

Download

Request Accessible Version

Included in

Computer Sciences Commons

COinS

Browse all Theses and Dissertations

Identifying Tweets with Implicit Entity Mentions

Publication Date

Document Type

Committee Members

Degree Name

Abstract

Page Count

Department or Program

Year Degree Awarded

Copyright

Included in

Search

Browse

About

Browse all Theses and Dissertations

Identifying Tweets with Implicit Entity Mentions

Author

Publication Date

Document Type

Committee Members

Degree Name

Abstract

Page Count

Department or Program

Year Degree Awarded

Copyright

Included in

Share

Search

Browse

About