Keke Chen (Committee Member), Guozhu Dong (Advisor), Pascal Hitzler (Committee Member), Zhiqiang Wu (Committee Member)
Doctor of Philosophy (PhD)
This dissertation studies the problem of mining shared and alignable difference knowledge structures across multiple datasets/applications. Shared and alignable difference knowledge structures are important for identifying analogies between application domains and for forming new hypothesis in challenging research applications, and for assessing the degree and types of knowledge-level similarities and differences between application domains for use in learning transfer. Generally speaking, shared knowledge structures characterize underlying datasets and highlight conceptual-level structural similarities among the datasets. This dissertation studies the mining of shared decision trees, which are a special type of shared knowledge structures. We first consider building one shared decision tree with high classification accuracy and high data distribution similarity for two given datasets. Moreover, it is observed that one shared decision tree may only present a limited view of shared behaviors between two given datasets. In order to help users to select from multiple diversified perspectives on shared knowledge structures, we propose the diversified decision tree set mining problem, whose goal is to mine a small set of k diversified high quality shared decision trees. Besides requiring each tree in the set to have high classification accuracy and highly similar data distributions in the given datasets, different trees in the set are also required to be highly different from each other. Algorithms are developed to solve both problems. Experimental results on microarray datasets for medicine are reported to evaluate the algorithms, together with the mined shared decision trees. This dissertation also introduces and studies the mining of alignable differences. Roughly speaking, alignable difference knowledge structures indicate significant differences in the context of a large amount of similarities among two given datasets. This dissertation considers alignable differences in the form of cross-domain decision trees. An algorithm to solve this problem is presented. Experimental results on microarray datasets for medicine are reported to evaluate the algorithm.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2013, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.