Publication Date

2010

Document Type

Thesis

Committee Members

Keke Chen (Committee Member), Guozhu Dong (Advisor), Pascal Hitzler (Committee Member)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

This thesis studies the problem of mining models, patterns andstructures (MPS) shared by two datasets (applications), a well understood dataset, denoted as WD, and a poorly understood one, denoted as PD. Combined with users' familiarity with WD, the shared MPS can help users better understand PD, since they capture similarities between WD and PD. Moreover, the knowledge on such similarities can enable the users to focus attention on analyzing the unique behavior of PD. Technically, this thesis focuses on the shared decision tree mining problem. In order to provide a view on the similarities between WD and PD, this thesis proposes to mine a high quality shared decision tree satisfying the properties: the tree has (1) highly similar data distribution and (2) high classification accuracy in the datasets. This thesis proposes an algorithm, namely SDT-Miner, for mining such shared decision tree. This algorithm is significantly different from traditional decision tree mining, since it addresses the challenges caused by the presence of two datasets, by the data distribution similarity requirement and by the tree accuracy requirement. The effectiveness of the algorithm is verified by experiments.

Page Count

52

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2010


Share

COinS