Publication Date

2012

Document Type

Thesis

Committee Members

Gouzhu Dong (Committee Member), Pascal Hitzler (Committee Chair), Krishnaprasad Thirunarayan (Committee Member)

Degree Name

Master of Science (MS)

Abstract

The terms Semantic Web and OWL are relatively new and growing concepts in the World Wide Web. Because these concepts are so new there are relatively few applications and/or tools for utilizing the potential power of this new concept. Although there are many components to the Semantic Web, this thesis will focus on the research question, "How do we go about developing a web crawler for the Semantic Web that locates and retrieves OWL documents." Specifically for this thesis, we hypothesize that by giving URIs to OWL documents, including all URIs from within these OWL documents, priority over other types of references, then we will locate more OWL documents than by any other type of traversal. We reason that OWL documents have proportionally more references to other OWL documents than non-OWL documents do, so that by giving them priority we should have located more OWL files when the crawl terminates, than by any other traversal method.

In order to develop such an OWL priority queue, we needed to develop some heuristics to predict OWL documents during real-time parsing of Semantic Web documents. These heuristics are based on filename extensions and OWL language constructs, which are not absolute when predicting a document type before retrieval. However, if our reasoning is correct, then URIs found in an OWL document will likely lead to more OWL documents, such that when the crawl ends because of reaching a maximum document limit, we will have retrieved more OWL documents than by other methods such as breadth-first or load-balanced. We conclude our research with an evaluation of our results to test the validity of our hypothesis and to see if it is worthy of future research.

Page Count

90

Department or Program

Department of Computer Science

Year Degree Awarded

2012

Creative Commons License

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.


Share

COinS