An Equivalence Class Based Clustering Algorithm for Categorical Data
Document Type
Conference Proceeding
Publication Date
10-2011
Abstract
Most traditional clustering methods rely on a distance function. However, the distance between categorical data is hard to define, especially for exploratory situations where the data is not well understood. As a result, many clustering methods do not perform well on categorical datasets. In this paper we propose a novel Equivalence Class based Clustering Algorithm for Categorical data (ECCC). ECCC takes the support transaction sets of selected frequent closed patterns as the candidate clusters. We define a novel quality measure to evaluate the suitability of frequent closed patterns to form the clusters; the measure is based on two factors: cluster coherence expressed in terms of closed patterns, and cluster discrimination expressed in terms of quality and diversity of minimal generator patterns. ECCC uses that measure to select the high quality frequent closed patterns to form the final clusters.
Repository Citation
Qingbao, L.,
Wanjun, W.,
& Su, D.
(2011). An Equivalence Class Based Clustering Algorithm for Categorical Data. Proceedings of the First International Conference on Advances in Information Mining and Management, 127-130.
https://corescholar.libraries.wright.edu/knoesis/384
Comments
Presented at the First International Conference on Advances in Information Mining and Management, Barcelona, Spain, October 23-29, 2011.