Combining the Strength of Pattern Frequency and Distance for Classification
Document Type
Conference Proceeding
Publication Date
4-2001
Find in a Library
Abstract
Supervised classification involves many heuristics, including the ideas of decision tree, k-nearest neighbour (k-NN), pattern frequency, neural network, and Bayesian rule, to base induction algorithms. In this paper, we propose a new instance-based induction algorithm which combines the strength of pattern frequency and distance. We define a neighbourhood of a test instance. If the neighbourhood contains training data, we use k-NN to make decisions. Otherwise, we examine the support (frequency) of certain types of subsets of the test instance, and calculate support summations for prediction. This scheme is intended to deal with outliers: when no training data is near to a test instance, then the distance measure is not a proper predictor for classification. We present an effective method to choose an “optimal” neighbourhood factor for a given data set by using a guidance from a partial training data. In this work, we find that our algorithm maintains (sometimes exceeds) the outstanding accuracy of k-NN on data sets containing pure continuous attributes, and that our algorithm greatly improves the accuracy of k-NN on data sets containing a mixture of continuous and categorical attributes. In general, our method is much superior to C5.0.
Repository Citation
Li, J.,
Ramamohanarao, K.,
& Dong, G.
(2001). Combining the Strength of Pattern Frequency and Distance for Classification. Lecture Notes in Computer Science, 2035, 455-466.
https://corescholar.libraries.wright.edu/knoesis/420
DOI
10.1007/3-540-45357-1_48
Comments
Presented at the 5th Pacific-Asia Conference on Advances in Knowledge Discovery (PAKDD), Hong Kong, April 16-18, 2001.