Publication Date

2007

Document Type

Thesis

Committee Members

Krishnaprasad Thirunarayan (Advisor)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

Due to the increasing pervasiveness of data sets using the XML data format, numerous query languages have been proposed that exploit the structure inherent in XML. Many such query languages, supported by specialized XML search engines, are complex and not suitable for naive users. A simple keyword based query language is described which not only exploits the structure of XML documents to extract relevant fragments, but can also fall back on retrieval through plain text search. This thesis focuses on developing a prototype implementation for a Coherent Keyword Based XML Query Language. It analyzes the typical challenges posed by the semi-structured nature of the XML format, and then describes the design and implementation of a framework that can index and search XML datasets. The prototype, built on Apache Lucene (a Java-based Text Indexing and Search APIs), incorporates several available techniques to obtain precise and coherent results. It also provides a simple user interface to browse the vicinity of result document fragments.

Page Count

79

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2007


Share

COinS