Publication Date

2011

Document Type

Dissertation

Committee Members

Gerald Alter (Committee Member), Travis Doom (Committee Member), Ruth Pachter (Committee Member), Michael Raymer (Committee Chair), Mateen Rizki (Committee Member)

Degree Name

Doctor of Philosophy (PhD)

Abstract

This dissertation develops and demonstrates a method to measure the uncertainty of secondary structure of protein sequences using Shannon's information theory. This method is applied to a newly developed large dataset of chameleon sequences and to several protein hinges culled from the Hinge Atlas. The uncertainty of the central residue in each tripeptide is computed for each amino acid in a sequence using Cuff and Barton's CB513 as the reference set. It is shown that while secondary structure uncertainty is relatively high in chameleon regions [avg = 1.27 bits] it is relatively low in the regions 1-7 residues nearest a chameleon [N terminus flank avg = 1.12 bits; C terminus flank avg = 1.16 bits]. This difference is shown to be highly statistically significant [ p = 9.6E-18 and p = 2.9E-12, respectively]. It is also shown that the secondary structure uncertainty of hinge regions was not found to be different to a statistically significant degree once a Bonferroni multiple test correction was applied.

A new hand curated database of long "chameleon" sequences was developed. It contains nine sequences eight residues in length and eighty-five sequences of length seven.

Page Count

182

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2011


Share

COinS