A Novel Approach for Cancer Characterization Using Latent Dirichlet Allocation and Disease-Specific Genomic Analysis
Travis Doom (Committee Member), Gengxin Li (Committee Member), Michael Markey (Committee Member), Michael Raymer (Advisor), Nicholas Reo (Committee Member)
Doctor of Philosophy (PhD)
Two challenging problems in the clinical study of cancer are the characterization of cancer subtypes and the classification of individual patients according to those subtypes. Further, understanding the role of differential gene expression in the development of and molecular response to cancer is a complex problem that remains challenging, in part due to the sheer number of genes and gene products involved. Traditional statistical approaches addressing these problems are hindered by within-class heterogeneity and challenges inherent in data integration across high-dimensional data. In addition, many current machine learning methods do not lend themselves to biological interpretation. We have developed a novel Latent Dirichlet Allocation (LDA)-based classification approach to classify unknown samples based on similarity of co-expression patterns and mitigate these challenges. Integrating this approach with several recently-developed feature engineering and visualization methods, including Disease Specific Genomic Analysis (DSGA) and topological data analysis (TDA), we developed an analysis pipeline that achieves high accuracy compared to state-of-the-art approaches. We demonstrate the effectiveness of this pipeline on several data sets including RNA-Seq data from Illumina HiSeq 2000 for breast cancer and lung cancer identification, mRNA expression data from Agilent Hu25k microarray for breast cancer subtype identification and copy-number data from Affymetrix SNP6.0 for melanoma identification. We also present functional analysis to identify relevant genes and the associated pathways that could potentially be involved in differentiating different tumor types.
Department or Program
Year Degree Awarded
Copyright 2018, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.