Kno.e.sis Publications

Efficiently Clustering Transactional Data with Weighted Coverage Density

Hua Yan, Wright State University - Main Campus
Keke Chen, Wright State University - Main CampusFollow
Ling Liu

Document Type

Article

Publication Date

2006

Find in a Library

Catalog Record

Abstract

It is widely recognized that developing efficient and fully automated algorithms for clustering large transactional datasets is a challenging problem. In this paper, we propose a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Our approach has three unique features. First, we use the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop two transactional data clustering specific evaluation metrics based on the concept of large transactional items and the coverage density respectively. Third, we implement the weighted coverage density clustering algorithm and the two clustering validation metrics using a fully automated transactional clustering framework, called SCALE (Sampling, Clustering structure Assessment, cLustering and domain-specific Evaluation). The SCALE framework is designed to combine the weighted coverage density measure for clustering over a sample dataset with self-configuring methods that can automatically tune the two important parameters of the clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set of clustering results. We have conducted experimental evaluation using both synthetic and real datasets and our results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality clustering results in a fully automated manner.

Comments

This paper was presented at the ACM Conference on Information and Knowledge Management (CIKM2006) , November, 2006, Arlington, VA.

Repository Citation

Yan, H., Chen, K., & Liu, L. (2006). Efficiently Clustering Transactional Data with Weighted Coverage Density. CIKM '06 Proceedings of the 15th ACM International Conference on Information and Knowledge Management, 367-376.
https://corescholar.libraries.wright.edu/knoesis/175

DOI

10.1145/1183614.1183668

Catalog Record

COinS

Kno.e.sis Publications

Efficiently Clustering Transactional Data with Weighted Coverage Density

Document Type

Publication Date

Find in a Library

Abstract

Comments

Repository Citation

DOI

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

Efficiently Clustering Transactional Data with Weighted Coverage Density

Authors

Document Type

Publication Date

Find in a Library

Abstract

Comments

Repository Citation

DOI

Share

Search

Browse

About

SelectedWorks Sites