Publication Date

2020

Document Type

Thesis

Committee Members

Soon M. Chung, Ph.D. (Advisor); Nikolaos Bourbakis, Ph.D. (Committee Member); Vincent A. Schmidt, Ph.D. (Committee Member)

Degree Name

Master of Science (MS)

Abstract

In the last decade, the advent of social media and microblogging services have inevitably changed our world. These services produce vast amounts of streaming data, and one of the most important ways of analyzing and discovering interesting trends in the streaming data is through clustering. In clustering streaming data, it is desirable to perform a single pass over incoming data, such that we do not need to process old data again, and the clustering model should evolve over time not to lose any important feature statistics of the data. In this research, we have developed a new clustering system that clusters social media data based on their textual content and displays the clusters and their locations on the map. Our system takes advantage of a text stream clustering algorithm, which uses the two-phase clustering process. The online micro-clustering phase incrementally creates micro-clusters, called text droplets, that represent enough information about topics occurring in the text stream. The off-line macro-clustering phase clusters micro-clusters for a user-specified time interval and can change macro-clustering algorithms dynamically. Our experiments demonstrated that the performance of our system is scalable; and it can be easily used by first responders and crisis management personnel to quickly determine if a crisis is happening, where it is concentrated, and what resources are best to deploy to the situation.

Page Count

53

Year Degree Awarded

2020

ORCID ID

0000-0001-6465-5303


Share

COinS