Michelle Cheatham (Committee Member), Keke Chen (Committee Member), Guozhu Dong (Advisor)
Master of Science in Computer Engineering (MSCE)
A good distance metric is instrumental on the performance of many tasks including classification and data retrieval. However, designing an optimal distance function is very challenging, especially when the data has high dimensions.Recently, a number of algorithms have been proposed to learn an optimal distance function in a supervised manner, using data with class labels. In this thesis we proposed methods to learn an optimal distance function that can also indicate the importance of attributes. Specifically, we present several ways to define idealized distance functions, two of which involving distance error correction involving KNN classification, and another involving a two-constant defined distance function. Then we use multiple linear regression to produce regression formulas to represent the idealized distance functions. Experiments indicate that distances produced by our approaches have classification accuracy that are fairly comparable to existing methods. Importantly, our methods have added bonus of using weights on attributes to indicate the importance of attributes in the constructed optimal distance functions. Finally, the thesis presents importance of attributes on a number of datasetsfrom the UCI repository.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2017, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.