Delroy Cameron (Committee Member), Thirunarayan Krishnaprasad (Committee Member), Michael Raymer (Committee Member), Thomas Rindflesch (Committee Member), Amit Sheth (Advisor)
Master of Science (MS)
Automatic generation of summaries that capture the salient aspects of a search resultset (i.e., automatic summarization) has become an important task in biomedical research. Automatic summarization offers an avenue for overcoming the information overload problem prevalent in large online digital libraries. However, across many of the knowledge-driven approaches for automatic summarization it is not always clear which features highly impact or influence the quality of a summary. Instead, there has been considerable focus on utilizing schema knowledge to facilitate browsing and exploration of generated summaries a posteriori. Informative features should not be ignored, since they could be utilized to help optimize the models that generate these semantic summaries in the first place. In this research, we adopt a leave-one-out approach to assess the impact of various features on the quality of automatically generated summaries that contain structured background knowledge. We first create the gold standard summaries, using information-theoretic methods, by extraction and validation, then the semantic summaries are transformed into an equivalent textual format. Finally, various similarity metrics, such as cosine similarity, euclidean distance, and jensen-shannon divergence are computed under different feature combinations, to assess summary quality against the textual gold standard. We report on the relative importance of the various features used to automatically generate the semantic summaries in a biomedical application. Our evaluation suggests that the proposed approach is an effective automatic evaluation method for assessing feature importance in automatically generated semantic summaries.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2016, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.