Publication Date


Document Type


Committee Members

Gerald Alter (Committee Member), Travis Doom (Committee Member), Ruth Pachter (Committee Member), Michael Raymer (Committee Chair), Mateen Rizki (Committee Member)

Degree Name

Doctor of Philosophy (PhD)


This dissertation develops and demonstrates a method to measure the uncertainty of secondary structure of protein sequences using Shannon's information theory. This method is applied to a newly developed large dataset of chameleon sequences and to several protein hinges culled from the Hinge Atlas. The uncertainty of the central residue in each tripeptide is computed for each amino acid in a sequence using Cuff and Barton's CB513 as the reference set. It is shown that while secondary structure uncertainty is relatively high in chameleon regions [avg = 1.27 bits] it is relatively low in the regions 1-7 residues nearest a chameleon [N terminus flank avg = 1.12 bits; C terminus flank avg = 1.16 bits]. This difference is shown to be highly statistically significant [ p = 9.6E-18 and p = 2.9E-12, respectively]. It is also shown that the secondary structure uncertainty of hinge regions was not found to be different to a statistically significant degree once a Bonferroni multiple test correction was applied.

A new hand curated database of long "chameleon" sequences was developed. It contains nine sequences eight residues in length and eighty-five sequences of length seven.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded