Genetic Programming for Improved Data Mining: Application to the Biochemistry of Protein Interactions

Document Type

Conference Proceeding

Publication Date



We have previously shown how a genetic algorithm (GA) can be used to perform "data mining," the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. However, these approaches, while successful, limited data relationships to those that were "fixed" before the GA run. We report here on an extension of our previous work, substituting a genetic program (GP) for a GA. The GP could optimize data classification, as did the GA, but could also determine the functional relationships among the features. This gave improved performance and new information on important relationships among features. We discuss the overall approach, and compare the effectiveness of the GA vs. GP on a biochemistry problem, the determination of the involvement of bound water molecules in protein interactions.


Presented at the 1st Annual Conference on Genetic Programming, Stanford, CA, July 28-31, 1996.