Authors

Qiao Zhang

Type

Text

Type

Dissertation

Advisor

Zhu, Wei | Wu, Song | Yang, Jie | Cao, Jian.

Date

2015-12-01

Keywords

Statistics | gene pathway, microarray

Department

Department of Applied Mathematics and Statistics.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/76527

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

A gene pathway typically refers to a group of genes and small molecules that work together to control one or more cell functions. In systems biology, pathway analysis is of paramount biological importance, and recent studies revealed that malfunction of gene pathways could induce disease manifestations, such as cancer. Usually, a gene pathway consists of two components: the upstream factors, which are signaling molecules transmitting stimulus from cell surface to nucleus, and the downstream factors, which respond to cell signaling through changes of their expression levels. Although several methods have been reported for analysis of gene pathways, almost all of them focus on the upstream factors of a pathway, ignoring the rich information from the downstream factors. In this thesis work, we first investigated and compared the existing gene pathway analysis methods, particularly on three most popular ones: Gene Set Enrichment Analysis (GSEA), Principal Component Analysis (PCA), and Canonical Discriminant Analysis (CDA). We then proposed an innovative method based on the concept of integrating the statistical information from both upstream and downstream factors to infer differential gene pathways. More specifically, the Relax Intersection-Union Test (RIUT) framework was employed to combine evidences from upstream and downstream factors. We performed intensive simulation studies with GSEA, PCA and CDA. We found out both the limitations and strengths of these methods under various data structures, and we identified scenarios in which each method can outperform the others. Furthermore, we demonstrated that our proposed combining method outperforms the above existing methods in terms of both power and interpretability in biology. We applied the combining method to two real data sets: the p53 data set and Essential thrombocythaemia data set. The results suggest that in the combining method, GSEA is more appropriate for the upstream subgroup and CDA is more powerful for the downstream subgroup due to their distinct data structures. | 130 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.