Publication Date
2018
Document Type
Thesis
Committee Members
John C. Gallagher (Advisor), Mateen M. Rizki (Committee Member), Thomas Wischgoll (Committee Member)
Degree Name
Master of Science (MS)
Abstract
Classification of environmental scenes and detection of events in one's environment from audio signals enables one to create better-planning agents, intelligent navigation systems, pattern recognition systems, and audio surveillance systems. This thesis will explore the use of Convolutional Neural Networks(CNN'S) with spectrograms and raw audio waveforms as inputs to Deep Neural Networks with hand engineered features extracted from large-scale feature extraction schemes to identify the acoustic scenes and events. The first part focuses on building an audio pattern recognition system capable of detecting the if there are zero, one, or two DJI phantoms in the scene within the range of a stereo microphone. The ability to distinguish the presence multiple UAV's could be used to augment information from other sensors less capable of making such determinations. The second part of the thesis focuses on building an acoustic scene detector to Task 1a in the DCASE2018 challenge(http://dcase.community/challenge2018/index). In both cases, this document will explain the pre-processing techniques, CNN and DNN architectures used, data augmentation methods including the use of Generative Adversarial Networks(GAN's), and performance results compared to existing benchmarks when available. This thesis will conclude with a discussion of how one might expand the techniques in the construction of commercial off the shelf audio scene classifier for multiple UAV detections.
Page Count
148
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2018
Copyright
Copyright 2018, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.