Document Type

Dissertation

Publication Date

2013

Abstract

The recent years have seen an increase in interest for knowledge repositories that are useful across applications, in contrast to the creation of ad hoc or application-specific databases.
These knowledge repositories figure as a central provider of unambiguous identifiers and semantic relationships between entities. As such, these shared entity descriptions serve as a common vocabulary to exchange and organize information in different formats and for different purposes. Therefore, there has been remarkable interest in systems that are able to automatically tag textual documents with identifiers from shared knowledge repositories so that the content in those documents is described in a vocabulary that is unambiguously understood across applications.

Tagging textual documents according to these knowledge bases is a challenging task. It involves recognizing the entities and concepts that have been mentioned in a particular passage and attempting to resolve eventual ambiguity of language in order to choose one of many possible meanings for a phrase. There has been substantial work on recognizing and disambiguating entities for specialized applications, or constrained to limited entity types and particular types of text. In the context of shared knowledge bases, since each application has potentially very different needs, systems must have unprecedented breadth and flexibility to ensure their usefulness across applications. Documents may exhibit different language and discourse characteristics, discuss very diverse topics, or require the focus on parts of the knowledge repository that are inherently harder to disambiguate. In practice, for developers looking for a system to support their use case, is often unclear if an existing solution is applicable, leading those developers to trial-and-error and ad hoc usage of multiple systems in an attempt to achieve their objective.

In this dissertation, I propose a conceptual model that unifies related techniques in this space under a common multi-dimensional framework that enables the elucidation of strengths and limitations of each technique, supporting developers in their search for a suitable tool for their needs. Moreover, the model serves as the basis for the development of flexible systems that have the ability of supporting document tagging for different use cases. I describe such an implementation, DBpedia Spotlight, along with extensions that we performed to the knowledge base DBpedia to support this implementation. I report evaluations of this tool on several well known data sets, and demonstrate applications to diverse use cases for further validation.

Comments

Video of Mendes' defense can be found at http://youtu.be/yLz-OeM2Q1I.

Presentation slides from Mendes' defense can be found at http://www.slideshare.net/knoesis/defense-28900047?ref=http://knoesis.org/aboutus/thesis_defense.

Repository Citation

Mendes, P. N. (2013). Adaptive Semantic Annotation of Entity and Concept Mentions in Text. .
https://corescholar.libraries.wright.edu/knoesis/1033

Download

Included in

Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons

COinS

Kno.e.sis Publications

Adaptive Semantic Annotation of Entity and Concept Mentions in Text

Document Type

Publication Date

Abstract

Comments

Repository Citation

Included in

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

Adaptive Semantic Annotation of Entity and Concept Mentions in Text

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

Included in

Share

Search

Browse

About

SelectedWorks Sites