Publication Date

2012

Document Type

Dissertation

Committee Members

Pascal Hitzler (Committee Member), Pankaj Mehra (Committee Member), Amit Sheth (Advisor), Shaojun Wang (Committee Member), Gerhard Weikum (Committee Member)

Degree Name

Doctor of Philosophy (PhD)

Abstract

I present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction techniques and validation through discourse and use of the extracted information.

I present the metaphor of the "Circle of Knowledge on the Web". In this context, knowledge acquisition on the web is seen as analogous to the way scientific disciplines gradually increase the knowledge available in their field.

Here, formal models of interest domains are created automatically or manually and then validated by implicit and explicit validation methods before the statements in the created models can be added to larger knowledge repositories, such as the Linked open Data cloud. This knowledge is then available for the next iteration of the knowledge acquisition cycle.

I will both give a theoretical underpinning as well as practical methods for the acquisition of knowledge in collaborative systems. I will cover both the Knowledge Engineering angle as well as the Information Extraction angle of this problem. Unlike traditional approaches, however, this dissertation will show how Information Extraction can be incorporated into a mostly Knowledge Engineering based approach as well as how an Information Extraction-based approach can make use of engineered concept repositories. Validation is seen as an integral part of this systemic approach to knowledge acquisition.

The centerpiece of the dissertation is a domain model extraction framework that implements the idea of the "Circle of Knowledge" to automatically create semantic models for domains of interest. It splits the involved Information Extraction tasks into that of Domain Definition, in which pertinent concepts are identified and categorized, and that of Domain Description, in which facts are extracted from free text that describe the extracted concepts. I then outline a social computing strategy for information validation in order to create knowledge from the extracted models.

This dissertation makes the following contributions: - A hermeneutic methodology for knowledge acquisition within a system, involving - Human and artificial agents - Formally represented knowledge, - Textual information, - Information Extraction methods and - Information validation techniques - Ontology Design - Automatic Domain Model creation - Top-down Domain hierarchy extraction (Domain Definition) - Bottom-up Pattern-based extraction of named relationships (Domain Description) - Distantly supervised Relational Targeting Information Extraction - Probabilistic positive-only Multi-class classifier - Statistical measure for relationship pertinence - Recall enhancement using pattern generalization - Implicit and Explicit Information validation

Page Count

237

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2012


Share

COinS