Authors

Erya Huang

Type

Text

Type

Dissertation

Advisor

Zhu, Wei | Wang, Xuefeng | Bahou, Wadie. | Wu, Song

Date

2015-12-01

Keywords

Statistics

Department

Department of Applied Mathematics and Statistics.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77665

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

Genome-wide association studies (GWA studies) are an important tool for identifying disease susceptibility variants for common and complex diseases. Traditional approaches to data analysis in GWA studies suffer with the multiple testing problem and also ignore any potential relationships between gene variants. We introduced here a novel two-stage framework with the combination of partial correlation network analysis (PCNA) and data mining techniques. This network-based technique, focusing on SNPs in joint modeling and their partial associations, alleviated the multiple testing problem and consequently increased the power to detect biologically relevant variants and their associations. Variable selection was achieved through penalized logistic regression using sparse-group lasso (SGL) penalty by grouping SNPs based on their: 1) pairwise canonical correlation measurement; or 2) biological information such as gene mapping. Network construction was based on pairwise partial correlation coefficients. Simulation studies have indicated that this two-stage approach achieved high accuracy and a low false-positive rate in the identification of known individual and two-way association targets, which elucidated that it is possible to recover the true direct relationship even for high-dimensional situation. Subsequently, we illustrated the proposed approach in a search for potential significant SNP-SNP/gene-gene associations with nicotine dependence using a real data example from a GWA study conducted by the Washington University at St. Louis. The result would provide researchers potentially biologically relevant genetic networks for further investigation. Another contribution of this thesis is the exploration of miRNA-mRNA regulatory set associated with essential thrombocytosis (ET) through the introduction of an application of penalized technique to canonical correlation analysis on microarray data sets. The identified variables were successfully tested by leave-one-out cross validation and a network exploration system. | 138 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.