Publication Date

2016

Document Type

Thesis

Committee Members

Michelle Cheatham (Advisor), Mateen Rizki (Committee Member), Krishnaprasad Thirunarayan (Committee Member)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

As the Semantic Web grows, so does the number of ontologies used to structure the data within it. Aligning these ontologies is critical to fully realizing the potential of the web. Previous work in ontology alignment has shown that even alignment systems utilizing basic string similarity metrics can produce useful matches. Researchers speculate that including semantic as well as syntactic information inherent in entity labels can further improve alignment results. This paper examines that hypothesis by exploring the utility of using Wikipedia as a source of semantic information. Various elements of Wikipedia are considered, including article content, page terms, and search snippets. The utility of each information source is analyzed and a composite system, WikiMatcher, is created based on this analysis. The performance of WikiMatcher is compared to that of a basic string-based alignment system on two established alignment benchmarks and two other real-world datasets. The extensive evaluation shows that although WikiMatcher performs similarly to that of the string metric overall, it is able to find many matches with no syntactic similarity between labels. This performance seems to be driven by Wikipedia's query resolution and page redirection system, rather than by the particular information from Wikipedia that is used to compare entities.

Page Count

59

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2016


Share

COinS