Publication Date


Document Type


Committee Members

Michael Raymer (Advisor)

Degree Name

Master of Science (MS)


Organisms expend a significant fraction of their overall energy budget in the creation of proteins, particularly for those that are produced in large quantities. Recent research has demonstrated that genes encoding these proteins are shaped by natural selection to produce the proteins with low cost building blocks (amino acids) whenever possible. The negative correlation between protein production rate and their energetic costs has been established for two bacterial genomes: Escherichia coli and Bacillus subtilis. This thesis provides scientific validation of this theory by automating the analysis and extending the research to additional genomes. Investigations into building block selection are highly computational in nature. Diverse methodologies, including principal component analysis, calculation of Mahalanobis distance, and the execution of Mantel-Haenszel and Bonferroni tests, are required in order to automate the process. In order to verify that the cause of the observed trend is energetic cost minimization it is necessary to eliminate as many alternative explanations as possible. This is accomplished through demonstration that the trend is not localized to any particular region of the protein’s primary structure and that the trend is consistent across all genes regardless of functionality. This investigation of the energetic cost of polypeptide synthesis provides valuable insights into protein building block selection. As an example, parasitic organisms appear to exhibit no correlation between protein production rate and amino acid cost. When the costs associated with building blocks that the parasite obtains from its host are removed,however, a trend once again becomes evident.

Page Count


Department or Program

Department of Computer Science

Year Degree Awarded