Krishnaprasad Thirunarayan (Committee Member), Isabel Cruz (Committee Member), Pascal Hitzler (Advisor), Mateen Rizki (Committee Member)
Doctor of Philosophy (PhD)
Ontology alignment is an important step in enabling computers to query and reason across the many linked datasets on the semantic web. This is a difficult challenge because the ontologies underlying different linked datasets can vary in terms of subject area coverage, level of abstraction, ontology modeling philosophy, and even language. The alignment approach presented here centers on string similarity metrics. Nearly all ontology alignment systems use a string similarity metric in one form or another, but it seems that the choice of a particular metric is often arbitrary. We begin this dissertation with the most comprehensive survey to date on the performance of string similarity metrics and string preprocessing strategies for ontology alignment. Based on this work we present practical guidelines for choosing string metrics in the face of different types of ontologies and different alignment goals. Additionally, we show that string similarity metrics alone can perform competitively with state-of-the-art alignment systems on the most popular benchmarks in the field.
One of the contributions of our string similarity metric survey is quantification of the difference in performance between aligning classes and aligning properties (relations between classes). Put simply: aligning properties is hard, and existing string similarity metrics are not of great help. We therefore take on the task of developing a new string-based alignment approach that performs better on properties. Unfortunately, evaluating that approach is difficult because the only existing alignment benchmark that includes properties is, in our view, unrealistic since all relations in the reference alignment are presented as completely certain. Human experts do not have this degree of confidence when asked to align an ontology. We therefore present a more nuanced version of this benchmark that we have created through a combination of expert survey and crowdsourcing. We then present our new string-based property alignment system and evaluate its performance on both the current benchmark and our proposed revision. Our property-centric string metric can be configured for either high precision or high recall. The results show a five-fold increase in precision and a doubling of recall over an approach based on the best current string metric. Finally, we apply our system to a real-world test case and analyze the results.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2014, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.