Publication Date
2007
Document Type
Thesis
Committee Members
Krishnaprasad Thirunarayan (Advisor)
Degree Name
Master of Science in Computer Engineering (MSCE)
Abstract
Due to the increasing pervasiveness of data sets using the XML data format, numerous query languages have been proposed that exploit the structure inherent in XML. Many such query languages, supported by specialized XML search engines, are complex and not suitable for naive users. A simple keyword based query language is described which not only exploits the structure of XML documents to extract relevant fragments, but can also fall back on retrieval through plain text search. This thesis focuses on developing a prototype implementation for a Coherent Keyword Based XML Query Language. It analyzes the typical challenges posed by the semi-structured nature of the XML format, and then describes the design and implementation of a framework that can index and search XML datasets. The prototype, built on Apache Lucene (a Java-based Text Indexing and Search APIs), incorporates several available techniques to obtain precise and coherent results. It also provides a simple user interface to browse the vicinity of result document fragments.
Page Count
79
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2007
Copyright
Copyright 2007, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.