Publication Date
2017
Document Type
Thesis
Committee Members
Keke Chen (Committee Member), Guozhu Dong (Committee Chair), Derek Doran (Committee Member)
Degree Name
Master of Science (MS)
Abstract
Correlation analysis is a frequently used statistical measure to examine the relationship among variables in different practical applications. However, the traditional correlation analysis uses an overly simplistic method to do so. It measures how two variables are related in an application by examining only their relationship in the entire underlying data space. As a result, traditional correlation analysis may miss a strong correlation between those variables especially when that relationship exists in the small subpopulation of the larger data space. This is no longer acceptable and may lose a fair share of information in this era of Big Data which often contains highly diverse nature of data where data can differ in a noticeable manner within the same application. To remedy this situation, we are introducing a new approach called Conditional Correlation Analysis (CCR) in this thesis. Instead of computing the correlation among variables in the entire data space, this approach first divides the entire data space into multiple subpopulations using patterns. It then computes the correlation for each subpopulation and identifies the subpopulation which is highly different (in term of correlation strength) from the global population. Moreover, we introduce the concepts of CCRs and the ways to mine those CCRs, provides measures to evaluate the unusualness of CCRs and gives experiments to evaluate and illustrate the CCR approach in financial and medical applications.
Page Count
46
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2017
Copyright
Copyright 2017, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.