Publication Date


Document Type


Committee Members

John C. Gallagher (Advisor), Mateen M. Rizki (Committee Member), Thomas Wischgoll (Committee Member)

Degree Name

Master of Science (MS)


Classification of environmental scenes and detection of events in one's environment from audio signals enables one to create better-planning agents, intelligent navigation systems, pattern recognition systems, and audio surveillance systems. This thesis will explore the use of Convolutional Neural Networks(CNN'S) with spectrograms and raw audio waveforms as inputs to Deep Neural Networks with hand engineered features extracted from large-scale feature extraction schemes to identify the acoustic scenes and events. The first part focuses on building an audio pattern recognition system capable of detecting the if there are zero, one, or two DJI phantoms in the scene within the range of a stereo microphone. The ability to distinguish the presence multiple UAV's could be used to augment information from other sensors less capable of making such determinations. The second part of the thesis focuses on building an acoustic scene detector to Task 1a in the DCASE2018 challenge( In both cases, this document will explain the pre-processing techniques, CNN and DNN architectures used, data augmentation methods including the use of Generative Adversarial Networks(GAN's), and performance results compared to existing benchmarks when available. This thesis will conclude with a discussion of how one might expand the techniques in the construction of commercial off the shelf audio scene classifier for multiple UAV detections.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded


Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.