Publication Date
2020
Document Type
Thesis
Committee Members
Soon M. Chung, Ph.D. (Advisor); Nikolaos Bourbakis, Ph.D. (Committee Member); Vincent A. Schmidt, Ph.D. (Committee Member)
Degree Name
Master of Science (MS)
Abstract
In the last decade, the advent of social media and microblogging services have inevitably changed our world. These services produce vast amounts of streaming data, and one of the most important ways of analyzing and discovering interesting trends in the streaming data is through clustering. In clustering streaming data, it is desirable to perform a single pass over incoming data, such that we do not need to process old data again, and the clustering model should evolve over time not to lose any important feature statistics of the data. In this research, we have developed a new clustering system that clusters social media data based on their textual content and displays the clusters and their locations on the map. Our system takes advantage of a text stream clustering algorithm, which uses the two-phase clustering process. The online micro-clustering phase incrementally creates micro-clusters, called text droplets, that represent enough information about topics occurring in the text stream. The off-line macro-clustering phase clusters micro-clusters for a user-specified time interval and can change macro-clustering algorithms dynamically. Our experiments demonstrated that the performance of our system is scalable; and it can be easily used by first responders and crisis management personnel to quickly determine if a crisis is happening, where it is concentrated, and what resources are best to deploy to the situation.
Page Count
53
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2020
Copyright
Copyright 2020, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.
ORCID ID
0000-0001-6465-5303