Publication Date
2015
Document Type
Thesis
Committee Members
Keke Chen (Committee Member), Banerjee Tanvi (Committee Member), Krishnaprasad Thirunarayan (Advisor)
Degree Name
Master of Science (MS)
Abstract
The search engine result snippets are an important source of information for the user to obtain quick insights into the corresponding result documents. When the search terms are too general, like a person's name or a company's name, creating an appropriate snippet that effectively summarizes the document's content can be challenging owing to multiple occurrences of the search term in the top ranked documents, without a simple means to select a subset of sentences containing them to form result snippet. In web pages classified as narratives and news articles, multiple references to explicit, implicit and relative temporal expressions can be found. Based on these expressions, the sentences can be ordered on a timeline. In this thesis, we propose the idea of generation of an alternate search results snippet, by exploiting these temporal expressions embedded within the pages, using a timeline map. Our method of snippets generation is mainly targeted at general search terms. At present, when the search terms are too general, the existing systems generate static snippets for resultant pages like displaying the first line. In our approach, we introduce an alternate method of extracting and selecting temporal data from these pages to adapt a snippet to be a more effective summary. Specifically, it selects and blends "temporally interesting" sentences. Using weighted kappa measure, we evaluate our approach by comparing snippets generated for multiple search terms based on existing systems and snippets generated by using our approach.
Page Count
63
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2015
Copyright
Copyright 2015, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.