Gerald Alter (Committee Member), Travis Doom (Committee Member), Ruth Pachter (Committee Member), Michael Raymer (Committee Chair), Mateen Rizki (Committee Member)
Doctor of Philosophy (PhD)
This dissertation develops and demonstrates a method to measure the uncertainty of secondary structure of protein sequences using Shannon's information theory. This method is applied to a newly developed large dataset of chameleon sequences and to several protein hinges culled from the Hinge Atlas. The uncertainty of the central residue in each tripeptide is computed for each amino acid in a sequence using Cuff and Barton's CB513 as the reference set. It is shown that while secondary structure uncertainty is relatively high in chameleon regions [avg = 1.27 bits] it is relatively low in the regions 1-7 residues nearest a chameleon [N terminus flank avg = 1.12 bits; C terminus flank avg = 1.16 bits]. This difference is shown to be highly statistically significant [ p = 9.6E-18 and p = 2.9E-12, respectively]. It is also shown that the secondary structure uncertainty of hinge regions was not found to be different to a statistically significant degree once a Bonferroni multiple test correction was applied.
A new hand curated database of long "chameleon" sequences was developed. It contains nine sequences eight residues in length and eighty-five sequences of length seven.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2011, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.