Michelle Cheatham (Committee Member), Pascal Hitzler (Advisor), Krzysztof Janowicz (Committee Member), Khrisnaprasad Thirunarayan (Committee Member)
Doctor of Philosophy (PhD)
Data integration is concerned with providing a unified access to data residing at multiple sources. Such a unified access is realized by having a global schema and a set of mappings between the global schema and the local schemas of each data source, which specify how user queries at the global schema can be translated into queries at the local schemas. Data sources are typically developed and maintained independently, and thus, highly heterogeneous. This causes difficulties in integration because of the lack of interoperability in the aspect of architecture, data format, as well as syntax and semantics of the data. This dissertation represents a study on how small, self-contained ontologies, called ontology design patterns, can be employed to provide semantic interoperability in a cross-repository data integration system. The idea of this so-called ontology pattern- based data integration is that a collection of ontology design patterns can act as the global schema that still contains sufficient semantics, but is also flexible and simple enough to be used by linked data providers. On the one side, this differs from existing ontology-based solutions, which are based on large, monolithic ontologies that provide very rich semantics, but enforce too restrictive ontological choices, hence are shunned by many data providers. On the other side, this also differs from the purely linked data based solutions, which do offer simplicity and flexibility in data publishing, but too little in terms of semantic interoperability. We demonstrate the feasibility of this idea through the actual development of a large scale data integration project involving seven ocean science data repositories from five institutions in the U.S. In addition, we make two contributions as part of this dissertation work, which also play crucial roles in the aforementioned data integration project. First, we develop a collection of more than a dozen ontology design patterns that capture the key notions in the ocean science occurring in the participating data repositories. These patterns contain axiomatization of the key notions and were developed with an intensive involvement from the domain experts. Modeling of the patterns was done in a systematic workflow to ensure modularity, reusability, and flexibility of the whole pattern collection. Second, we propose the so-called pattern views that allow data providers to publish their data in very simple intermediate schema and show that they can greatly assist data providers to publish their data without requiring a thorough understanding of the axiomatization of the patterns.
Department or Program
Department of Computer Science and Engineering
Copyright, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.