Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm

Document Type

Conference Proceeding

Publication Date



Statistical pattern recognition techniques classify objects in terms of a representative set of features. The selection of features to measure and include can have a significant effect on the cost and accuracy of an automated classifier. Our previous research has shown that a hybrid between a k-nearest-neighbors (knn) classifier and a genetic algorithm (GA) can achieve greater classification accuracy than a knn alone by weighting features during knn classification. Here we describe an extension to this approach which further enhances feature selection through the simultaneous optimization of feature weights and selection of key features by including a masking vector on the GA chromosome. We present the results of our GA/knn feature selection method on two important problems from biochemistry and medicine: identification of conserved water molecules bound to protein surfaces, and diagnosis of thyroid deficiency. By allowing the GA to explore the effect of eliminating a feature from the classification without losing the weight knowledge already learned, the feature masking technique allows the GA/knn to efficiently examine noisy, complex, and high-dimensionality datasets to find combinations of features which classify the data more accurately. In both biomedical applications, use of the feature masking technique resulted in equivalent or better accuracy than feature weighting alone, while using fewer features for the classification.


Presented at the 7th International Conference on Genetic Algorithms, East Lansing, MI, July 19-23, 1997.