Publication Date
2007
Document Type
Thesis
Committee Members
Krishnaprasad Thirunarayan (Advisor)
Degree Name
Master of Science (MS)
Abstract
The growing News document archive emphasizes the need for efficient techniques to retrieve and visualize its content. We present a timeline based graphical interface for this purpose. The timeline is a graph of number of documents supporting association between entity (event, country, person, etc) etc and event (entity, country, person, etc) with respect to dates. The query is formulated based on entity, event, country, and person metadata extracted from the text of the News documents by analyzing the documents using proprietary name-entity recognizers. The timeline also provides a means to index and access relevant documents. Associations inferred on the basis of document-level metadata are not always correct in the presence of News documents with multiple News stories. The mis-associations can be eliminated by requiring paragraph/sentence level co-occurrence of the corresponding phrases. Our refined timeline points are also annotated with cluster labels generated from headlines and sentences. We have decoupled document archive from the GUI by generating metadata for timelines offline, and provided two separate renderings of the timeline using Java and Adobe Flex.
Page Count
87
Department or Program
Department of Computer Science
Year Degree Awarded
2007
Copyright
Copyright 2007, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.