Publication Date

2009

Document Type

Thesis

Committee Members

Travis Doom (Committee Member), Dan Krane (Committee Member), Michael Raymer (Advisor), Thomas Sudkamp (Other), Joseph F. Thomas, Jr. (Other)

Degree Name

Master of Science (MS)

Abstract

Organisms construct proteins out of individual amino acids using instructions encoded in the nucleotide sequence of a DNA molecule. The genetic code associates combinations of three nucleotides, called codons, with every amino acid. Most amino acids are associated with multiple synonymous codons, but although they result in the same amino acid and thus have no effect on the final protein, synonymous codons are not present in equal amounts in the genomes of most organisms. This phenomenon is known as codon usage bias, and the literature has shown that all organisms display a unique pattern of codon usage. Research also suggests that organisms with similar codon usage share biological similarities as well. This thesis helps to verify this theory by using an existing computational algorithm along with multivariate analysis to demonstrate that there is a significant difference between the codon usage of free-living prokaryotes and that of obligate intracellular prokaryotes. The observed difference is primarily the result of GC content, with the additional effect of an unknown factor.

Although the existing literature often mentions the strength of biased codon usage, it does not contain a clear, consistent definition of the concept. This thesis provides a disambiguated definition of bias strength and clarifies the relationships between this and other properties of biased codon usage. A bias strength metric, designed to match the given definition of bias strength, is proposed. Evaluation of this metric demonstrates that it compares favorably with existing metrics used in the literature as criteria for bias strength, and also suggests that codon usage bias in general follows the trend of being either strong and global to the genome, or weak and present in only a subset of the genome. Analysis of these metrics provides insight into the unknown factor partially responsible for the codon usage difference between free-living and obligatorily intracellular prokaryotes, and the proposed bias strength metric is used to draw conclusions about the characteristics of GC-content bias.

Page Count

114

Department or Program

Department of Computer Science

Year Degree Awarded

2009


Share

COinS