Publication Date

2008

Document Type

Dissertation

Committee Members

Guozhu Dong (Committee Member), Vasant Honavar (Committee Member), Michael Raymer (Committee Member), Amit Sheth (Advisor), Thaddeaus Tarpey (Committee Member), Shaojun Wang (Committee Member)

Degree Name

Doctor of Philosophy (PhD)

Abstract

The information access paradigm offered by most contemporary text information systems is a search-and-sift paradigm where users have to manually glean and aggregate relevant information from the large number of documents that are typically returned in response to keyword queries. Expecting the users to glean and aggregate information has lead to several inadequacies in these information systems. Owing to the size of many text databases, search-and-sift is a very tedious often requiring repeated keyword searches refining or generalizing queries terms. A more serious limitation arises from the lack of automated mechanisms to aggregate content across different documents to discover new knowledge. This dissertation focuses on processing text to assign semantic interpretations to its content (extracting Semantic metadata) and the design of algorithms and heuristics to utilize the extracted semantic metadata to support knowledge discovery operations over text content. Contributions in extracting semantic metadata in this dissertation cover the extraction of compound entities and complex relationships connecting entities. Extraction results are represented using a standard Semantic Web representation language (RDF) and are manually evaluated for accuracy. Knowledge discovery algorithms presented herein operate on RDF data. To further improve access mechanisms to text content, applications supporting semantic browsing and semantic search of text are presented.

Page Count

147

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2008


Share

COinS