Publication Date


Document Type


Committee Members

Travis Doom (Committee Member), Dan Krane (Committee Member), Michael Raymer (Advisor), Dale Wischgoll (Committee Member), Wischgoll Wischgoll (Committee Member)

Degree Name

Doctor of Philosophy (PhD)


While genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons is preferred. Isolation of translational efficiency bias can have important applications in gene expression prediction, heterologous protein production, prediction of organismal lifestyle, and identification of candidates for horizontal gene transfer. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data can be confounded by the presence of factors simultaneously influencing codon selection. Presented here are new techniques (deterministic and stochastic) for removing the effects of one of the more common confounding factors, GC(AT)-content, and of analyzing/visualizing the search-space for codon usage bias through the use of a solution landscape. These techniques successfully isolate expressivity-related codon usage trends, using only genomic sequence information, where other techniques fail due to the presence of GC(AT)-content confounding influences. While the disambiguation techniques presented here are for genomes confounded by GC(AT)-content usage trends, these methods should be equally applicable to any other well-characterized confounding bias.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded