Mining Diversified Shared Decision Tree Sets for Discovering Cross Domain Similarities

Document Type

Conference Proceeding

Publication Date


Find in a Library

Catalog Record


This paper studies the problem of mining diversified sets of shared decision trees (SDTs). Given two datasets representing two application domains, an SDT is a decision tree that can perform classification on both datasets and it captures class-based population-structure similarity between the two datasets. Previous studies considered mining just one SDT. The present paper considers mining a small diversified set of SDTs having two properties: (1) each SDT in the set has high quality with regard to “shared” accuracy and population-structure similarity and (2) different SDTs in the set are very different from each other. A diversified set of SDTs can serve as a concise representative of the huge space of possible cross-domain similarities, thus offering an effective way for users to examine/select informative SDTs from that huge space. The diversity of an SDT set is measured in terms of the difference of the attribute usage among the SDTs. The paper provides effective algorithms to mine diversified sets of SDTs. Experimental results show that the algorithms are effective and can find diversified sets of high quality SDTs.


Presented at the 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Tainan, Taiwan, May 13-16, 2014.



Catalog Record