Publication Date


Document Type


Committee Members

Travis Doom (Committee Member), Gengxin Li (Committee Member), Michael Markey (Committee Member), Michael Raymer (Advisor), Nicholas Reo (Committee Member)

Degree Name

Doctor of Philosophy (PhD)


Two challenging problems in the clinical study of cancer are the characterization of cancer subtypes and the classification of individual patients according to those subtypes. Further, understanding the role of differential gene expression in the development of and molecular response to cancer is a complex problem that remains challenging, in part due to the sheer number of genes and gene products involved. Traditional statistical approaches addressing these problems are hindered by within-class heterogeneity and challenges inherent in data integration across high-dimensional data. In addition, many current machine learning methods do not lend themselves to biological interpretation. We have developed a novel Latent Dirichlet Allocation (LDA)-based classification approach to classify unknown samples based on similarity of co-expression patterns and mitigate these challenges. Integrating this approach with several recently-developed feature engineering and visualization methods, including Disease Specific Genomic Analysis (DSGA) and topological data analysis (TDA), we developed an analysis pipeline that achieves high accuracy compared to state-of-the-art approaches. We demonstrate the effectiveness of this pipeline on several data sets including RNA-Seq data from Illumina HiSeq 2000 for breast cancer and lung cancer identification, mRNA expression data from Agilent Hu25k microarray for breast cancer subtype identification and copy-number data from Affymetrix SNP6.0 for melanoma identification. We also present functional analysis to identify relevant genes and the associated pathways that could potentially be involved in differentiating different tumor types.

Page Count


Department or Program

Biomedical Sciences

Year Degree Awarded


Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.