Publication Date

2017

Document Type

Thesis

Committee Members

Keke Chen (Committee Member), Guozhu Dong (Committee Chair), Derek Doran (Committee Member)

Degree Name

Master of Science (MS)

Abstract

Correlation analysis is a frequently used statistical measure to examine the relationship among variables in different practical applications. However, the traditional correlation analysis uses an overly simplistic method to do so. It measures how two variables are related in an application by examining only their relationship in the entire underlying data space. As a result, traditional correlation analysis may miss a strong correlation between those variables especially when that relationship exists in the small subpopulation of the larger data space. This is no longer acceptable and may lose a fair share of information in this era of Big Data which often contains highly diverse nature of data where data can differ in a noticeable manner within the same application. To remedy this situation, we are introducing a new approach called Conditional Correlation Analysis (CCR) in this thesis. Instead of computing the correlation among variables in the entire data space, this approach first divides the entire data space into multiple subpopulations using patterns. It then computes the correlation for each subpopulation and identifies the subpopulation which is highly different (in term of correlation strength) from the global population. Moreover, we introduce the concepts of CCRs and the ways to mine those CCRs, provides measures to evaluate the unusualness of CCRs and gives experiments to evaluate and illustrate the CCR approach in financial and medical applications.

Page Count

46

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2017


Share

COinS