Publication Date

2007

Document Type

Thesis

Committee Members

Krishnaprasad Thirunarayan (Advisor)

Degree Name

Master of Science (MS)

Abstract

The growing News document archive emphasizes the need for efficient techniques to retrieve and visualize its content. We present a timeline based graphical interface for this purpose. The timeline is a graph of number of documents supporting association between entity (event, country, person, etc) etc and event (entity, country, person, etc) with respect to dates. The query is formulated based on entity, event, country, and person metadata extracted from the text of the News documents by analyzing the documents using proprietary name-entity recognizers. The timeline also provides a means to index and access relevant documents. Associations inferred on the basis of document-level metadata are not always correct in the presence of News documents with multiple News stories. The mis-associations can be eliminated by requiring paragraph/sentence level co-occurrence of the corresponding phrases. Our refined timeline points are also annotated with cluster labels generated from headlines and sentences. We have decoupled document archive from the GUI by generating metadata for timelines offline, and provided two separate renderings of the timeline using Java and Adobe Flex.

Page Count

87

Department or Program

Department of Computer Science

Year Degree Awarded

2007


Share

COinS