Publication Date
2011
Document Type
Dissertation
Committee Members
Gerald Alter (Committee Member), Travis Doom (Committee Member), Ruth Pachter (Committee Member), Michael Raymer (Committee Chair), Mateen Rizki (Committee Member)
Degree Name
Doctor of Philosophy (PhD)
Abstract
This dissertation develops and demonstrates a method to measure the uncertainty of secondary structure of protein sequences using Shannon's information theory. This method is applied to a newly developed large dataset of chameleon sequences and to several protein hinges culled from the Hinge Atlas. The uncertainty of the central residue in each tripeptide is computed for each amino acid in a sequence using Cuff and Barton's CB513 as the reference set. It is shown that while secondary structure uncertainty is relatively high in chameleon regions [avg = 1.27 bits] it is relatively low in the regions 1-7 residues nearest a chameleon [N terminus flank avg = 1.12 bits; C terminus flank avg = 1.16 bits]. This difference is shown to be highly statistically significant [ p = 9.6E-18 and p = 2.9E-12, respectively]. It is also shown that the secondary structure uncertainty of hinge regions was not found to be different to a statistically significant degree once a Bonferroni multiple test correction was applied.
A new hand curated database of long "chameleon" sequences was developed. It contains nine sequences eight residues in length and eighty-five sequences of length seven.
Page Count
182
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2011
Copyright
Copyright 2011, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.