Document Type

Article

Publication Date

1-2010

Abstract

This paper presents SCALE, a fully automated transactional clustering framework. The SCALE design highlights three unique features. First, we introduce the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and it allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop the weighted coverage density measure based clustering algorithm, a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Third, we introduce two clustering validation metrics and show that these domain specific clustering evaluation metrics are critical to capture the transactional semantics in clustering analysis. Our SCALE framework combines the weighted coverage density measure for clustering over a sample dataset with self-configuring methods. These self-configuring methods can automatically tune the two important parameters of our clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set of clustering results. We have conducted extensive experimental evaluation using both synthetic and real datasets and our results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality clustering results in a fully automated manner.

Comments

The featured PDF document is the unpublished, peer-reviewed version of this article.

The final publication is available at http://link.springer.com/article/10.1007%2Fs10618-009-0134-5 .

Repository Citation

Yan, H., Chen, K., Liu, L., & Yi, Z. (2010). SCALE: A Scalable Framework for Efficiently Clustering Transactional Data. Data Mining and Knowledge Discovery, 20 (1), 1-27.
https://corescholar.libraries.wright.edu/knoesis/91

DOI

10.1007/s10618-009-0134-5

Download

Included in

Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons

COinS

Kno.e.sis Publications

SCALE: A Scalable Framework for Efficiently Clustering Transactional Data

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

SCALE: A Scalable Framework for Efficiently Clustering Transactional Data

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Share

Search

Browse

About

SelectedWorks Sites