Publication Date


Document Type


Committee Members

Keke Chen (Advisor), Guozhu Dong (Committee Member), Mateen Rizki (Other), Thomas Wischgoll (Committee Member)

Degree Name

Master of Science in Computer Engineering (MSCE)


With the development and deployment of ubiquitous information sensing, mobile devices,wireless sensor networks, RFID readers, simulation, and computer generated software logs, big data have become precious resources for scientific study, business intelligence, and national security. As one of the most intuitive and effective analysis methods,visual cluster analysis remains as a significant challenge for big datasets. First, existing visualization models need to be updated to process big data in parallel. Second, processing big data inevitably bring large latency, which conflicts the requirement of interactivity. In this thesis, we develop the CloudVista framework to address the common problems with data reduction methods and the conflict between the latency caused by processing big data and the interactivity desired by visual cluster exploration. There are a number of components in the framework: (1) the data structure visual frame and the previously developed VISTA visualization model for parallel processing; (2)the RandGen algorithm that generates batches of meaningful visual frames; and (3) a workflow to minimize the cost of big data processing. The CloudVista demonstration system is designed and implemented with web services and Hadoop/MapReduce, assuming the entire big data stored in the cloud.Finally, we show some visualization results and performance evaluation results based on the demonstration system.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded