Publication Date

2017

Document Type

Thesis

Committee Members

Michelle Cheatham (Committee Member), Keke Chen (Committee Member), Guozhu Dong (Advisor)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

A good distance metric is instrumental on the performance of many tasks including classification and data retrieval. However, designing an optimal distance function is very challenging, especially when the data has high dimensions.Recently, a number of algorithms have been proposed to learn an optimal distance function in a supervised manner, using data with class labels. In this thesis we proposed methods to learn an optimal distance function that can also indicate the importance of attributes. Specifically, we present several ways to define idealized distance functions, two of which involving distance error correction involving KNN classification, and another involving a two-constant defined distance function. Then we use multiple linear regression to produce regression formulas to represent the idealized distance functions. Experiments indicate that distances produced by our approaches have classification accuracy that are fairly comparable to existing methods. Importantly, our methods have added bonus of using weights on attributes to indicate the importance of attributes in the constructed optimal distance functions. Finally, the thesis presents importance of attributes on a number of datasetsfrom the UCI repository.

Page Count

44

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2017


Share

COinS