Anomaly detection for data streams in large-scale distributed heterogeneous computing environments

Document Type

Conference Proceeding

Publication Date



Counteracting cyber threats to ensure secure cyberspace faces great challenges as cyber-attacks are increasingly stealthy and sophisticated; the protected cyber domains exhibit rapidly growing complexity and scale. It is important to design big data-driven cyber security solutions that effectively and efficiently derive actionable intelligence from available heterogeneous sources of information using principled data analytic methods to defend against cyber threats. In this work, we present a scalable distributed framework to collect and process extreme-scale networking and computing system traffic and status data from multiple sources that collectively represent the system under study, and develop and apply real-time adaptive data analytics for anomaly detection to monitor, understand, maintain, and improve cybersecurity. The data analytics will integrate multiple sophisticated machine learning algorithms and human-in-the-loop for iterative ensemble learning. Given the volume, speed, and complex nature of the data gathered, plus the need of real-time data analytics, a scalable data processing framework needs to handle big data with low latency. Our proposed big-data analytics will be implemented using an Apache Spark computing cluster. The analytics developed will offer significant improvements over existing methods of anomaly detection in real time. Our preliminary evaluation studies have shown that the developed techniques achieve better capabilities of defending against cyber threats.

Find in your library

Off-Campus WSU Users