Exploring Constraints to Efficiently Mine Emerging Patterns from Large High-dimensional Datasets

Document Type

Conference Proceeding

Publication Date


Find in a Library

Catalog Record


Emerging patterns (EPs) were proposed recently to capture changes or differences between datasets: an EP is a multi-variate feature whose support increases sharply from a background dataset to a target dataset, and the support ratio is called its growth rate. Interesting long EPs often have low support; mining such EPs from high-dimensional datasets is a great challenge due to the combinatorial explosion of the number of candidates. We propose a Constraint-based EP Miner, ConsEPMiner, that utilizes two types of constraints for effectively pruning the search space: External constrain ts are user-given minimums on support, growth rate, and growth-rate improvement to confine the resulting EP set. Inherent constraints same subset support, top growth rate, and same origin -- are derived from the properties of EPs and datasets, and are solely for pruning the search space and saving computation. ConsEPMiner can efficiently mine all EPs at low support on large high- dimensional datasets, with low minimums on growth rate and growth-rate improvement. In comparison, the widely known Apriori-like approach is ineffective on high-dimensional data. While ConsEPMiner adopts several ideas from Dense-Miner [4], a recent constrain t-based association rule miner, its main new contributions are the introduction of inherent constrain ts and the ways to use them together with external- constrain ts for efficient EP mining from dense datasets. Experiments on dense data show that, at low support, ConsEPMiner outperforms the Apriori-like approach by orders of magnitude and is more than twice as fast as the Dense- Miner approach.


Presented at the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, August 20-23, 2000.



Catalog Record