Genetic Programming for Improved Data Mining: Application to the Biochemistry of Protein Interactions
We have previously shown how a genetic algorithm (GA) can be used to perform "data mining," the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. However, these approaches, while successful, limited data relationships to those that were "fixed" before the GA run. We report here on an extension of our previous work, substituting a genetic program (GP) for a GA. The GP could optimize data classification, as did the GA, but could also determine the functional relationships among the features. This gave improved performance and new information on important relationships among features. We discuss the overall approach, and compare the effectiveness of the GA vs. GP on a biochemistry problem, the determination of the involvement of bound water molecules in protein interactions.
Raymer, M. L.,
Punch, W. F.,
Goodman, E. D.,
& Kuhn, L. A.
(1996). Genetic Programming for Improved Data Mining: Application to the Biochemistry of Protein Interactions. Proceedings of the First Annual Conference on Genetic Programming, 275-380.