Document Type

Presentation

Publication Date

2005

Abstract

With the growing demand on cluster analysis for categorical data, a handful of categorical clustering algorithms have been developed. Surprisingly, to our knowledge, none has satisfactorily addressed the important problem for categorical clustering – how can we determine the best K number of clusters for a categorical dataset? Since categorical data does not have the inherent distance function as the similarity measure, traditional cluster validation techniques based on the geometry shape and density distribution cannot be applied to answer this question. In this paper, we investigate the entropy property of the categorical data and propose a BkPlot method for determining a set of candidate “best Ks”. This method is implemented with a hierarchical clustering algorithm ACE. The experimental results show that our approach can effectively identify the significant clustering structures.

Comments

This paper was presented at the Scientific and Statistical Database Management Conference (SSDBM05), Santa Barbara, CA, June 2005.

Repository Citation

Chen, K., & Liu, L. (2005). The "Best K" for Entropy-based Categorical Data Clustering. .
https://corescholar.libraries.wright.edu/knoesis/176

Download

Included in

Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons

COinS

Kno.e.sis Publications

The "Best K" for Entropy-based Categorical Data Clustering

Document Type

Publication Date

Abstract

Comments

Repository Citation

Included in

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

The "Best K" for Entropy-based Categorical Data Clustering

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

Included in

Share

Search

Browse

About

SelectedWorks Sites