Publication Date

2010

Document Type

Thesis

Committee Members

Keke Chen (Committee Member), Guozhu Dong (Advisor), Krishnaprasad Thirunarayan (Committee Member)

Degree Name

Master of Science (MS)

Abstract

The data clustering problem has received much attention in the data mining, machine learning, and pattern recognition communities over a long period of time. Many previous approaches to solving this problem require the use of a distance function. However, since clustering is highly explorative and is usually performed on data which are rather new, it is debatable whether users can provide good distance functions for the data. This thesis proposes a Contrast Pattern based Clustering (CPC) algorithm to construct clusters without a distance function, by focusing on the quality and diversity/richness of contrast patterns that contrast the clusters in a clustering. Specifically, CPC attempts to maximize the Contrast Pattern based Clustering Quality (CPCQ) index, which can recognize that expert-determined classes are the best clusters for many datasets in the UCI Repository. Experiments using UCI datasets show that CPCQ scores are higher for clusterings produced by CPC than those by other, well-known clustering algorithms. Furthermore, CPC is able to recover expert clusterings from these datasets with higher accuracy than those algorithms.

Page Count

46

Department or Program

Department of Computer Science

Year Degree Awarded

2010


Share

COinS