Document Type

Article

Publication Date

1-2009

Abstract

The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity measure. Based on the above measure, an agglomerative hierachical clustering algorithm is developed and the Merge Dissimilarity Indexes, which are generated in hierachical cluster merging processes, are used to find the candidate optimal number Ks of clusters of transactional data. Our experimental results on both synthetic and real data show that the new method often effectively estimates the number of clusters of transactional data.

Comments

The article posted is the authors' preprint version.

Repository Citation

Yan, H., Chen, K., & Liu, L. (2009). Determining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach. Data and Knowledge Engineering, 68 (1), 28-48.
https://corescholar.libraries.wright.edu/knoesis/114

DOI

10.1016/j.datak.2008.08.005

Download

Included in

Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons

COinS

Kno.e.sis Publications

Determining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

Determining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Share

Search

Browse

About

SelectedWorks Sites