Keke Chen (Committee Member), Guozhu Dong (Advisor), Krishnaprasad Thirunarayan (Committee Member)
Master of Science (MS)
The data clustering problem has received much attention in the data mining, machine learning, and pattern recognition communities over a long period of time. Many previous approaches to solving this problem require the use of a distance function. However, since clustering is highly explorative and is usually performed on data which are rather new, it is debatable whether users can provide good distance functions for the data. This thesis proposes a Contrast Pattern based Clustering (CPC) algorithm to construct clusters without a distance function, by focusing on the quality and diversity/richness of contrast patterns that contrast the clusters in a clustering. Specifically, CPC attempts to maximize the Contrast Pattern based Clustering Quality (CPCQ) index, which can recognize that expert-determined classes are the best clusters for many datasets in the UCI Repository. Experiments using UCI datasets show that CPCQ scores are higher for clusterings produced by CPC than those by other, well-known clustering algorithms. Furthermore, CPC is able to recover expert clusterings from these datasets with higher accuracy than those algorithms.
Department or Program
Department of Computer Science
Year Degree Awarded
Copyright 2010, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.