Type
Text
Type
Dissertation
Advisor
Mendell, Nancy R.Wu, Song | Finch, Stephen J. | Gordon, Derek.
Date
2012-08-01
Keywords
Statistics | genome wide association study, longitudinal quantitative trait locus, population stratification, principal component analysis
Department
Department of Applied Mathematics and Statistics
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/71024
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
Genome-wide association studies (GWAS) are widely used to detect genotypes associated with complex diseases. Such GWAS studies of disease progression over time may be clinically significant. Longitudinal quantitative trait locus (LQTL) methods are used in these studies to simulate disease progression. However, population stratification (PS) can lead to false positive or negative findings when conducting a GWAS study. PS is induced by a candidate marker's variation in allele frequency across ancestral populations. One of the approaches used to adjust for population stratification in GWAS is the global principal component analysis (PCA) approach. In this thesis I examine the statistical properties of GWAS analysis procedures using principal component adjustments across the whole genome. I use additive risk allele models to test the association between rare genetic variants and the longitudinal quantitative phenotypes across the whole genome. The genotype data are taken from the Hapmap 3 dataset for 1198 unrelated individuals. The simulated quantitative phenotype data are estimated using the Bayesian posterior probabilities (BPPs) that a participant belongs to a clinically important trajectory curve. The PCA method implemented in the EIGENSTRAT program is then used to reduce the data to ten variables containing most of the genetic variability information. The power and rejection rates are evaluated based on 1000 simulated replicates. The association test follows a chi-square distribution with one degree of freedom under the null hypothesis of no association. The p-values of the test of the coefficient of a genotype with and without a PC adjustment for PS are documented. For each disease gene, I select 25 matching SNPs (the ones with high correlation coefficient of allele frequencies with the disease gene across population) and 25 non-correlated SNPs (the ones with low correlation coefficient of allele frequencies with the disease gene across population). All SNPs considered are in overall Hardy Weinberg equilibrium (HWE). The additive risk allele model LQTL models have strong empirical power. The model with global PCA adjustment for PS is able to consistently maintain correct false positive rates. | 164 pages
Recommended Citation
Wang, Yifan, "Adjusting for population stratification in longitudinal quantitative trait locus identification" (2012). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 231.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/231