Type
Text
Type
Dissertation
Advisor
Zhu, Wei | Wu, Song | Yang, Jie | Cao, Jian.
Date
2015-12-01
Keywords
Statistics | gene pathway, microarray
Department
Department of Applied Mathematics and Statistics.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/76527
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
A gene pathway typically refers to a group of genes and small molecules that work together to control one or more cell functions. In systems biology, pathway analysis is of paramount biological importance, and recent studies revealed that malfunction of gene pathways could induce disease manifestations, such as cancer. Usually, a gene pathway consists of two components: the upstream factors, which are signaling molecules transmitting stimulus from cell surface to nucleus, and the downstream factors, which respond to cell signaling through changes of their expression levels. Although several methods have been reported for analysis of gene pathways, almost all of them focus on the upstream factors of a pathway, ignoring the rich information from the downstream factors. In this thesis work, we first investigated and compared the existing gene pathway analysis methods, particularly on three most popular ones: Gene Set Enrichment Analysis (GSEA), Principal Component Analysis (PCA), and Canonical Discriminant Analysis (CDA). We then proposed an innovative method based on the concept of integrating the statistical information from both upstream and downstream factors to infer differential gene pathways. More specifically, the Relax Intersection-Union Test (RIUT) framework was employed to combine evidences from upstream and downstream factors. We performed intensive simulation studies with GSEA, PCA and CDA. We found out both the limitations and strengths of these methods under various data structures, and we identified scenarios in which each method can outperform the others. Furthermore, we demonstrated that our proposed combining method outperforms the above existing methods in terms of both power and interpretability in biology. We applied the combining method to two real data sets: the p53 data set and Essential thrombocythaemia data set. The results suggest that in the combining method, GSEA is more appropriate for the upstream subgroup and CDA is more powerful for the downstream subgroup due to their distinct data structures. | 130 pages
Recommended Citation
Zhang, Qiao, "Identification of Differential Gene Pathways in Microarray Data" (2015). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 2434.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/2434