Type
Text
Type
Dissertation
Advisor
Mendell, Nancy | Finch, Stephen J. | Zhu, Wei | Gordon, Derek.
Date
2014-12-01
Keywords
Statistics
Department
Department of Applied Mathematics and Statistics.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/76538
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
Genotype misclassification errors are known to reduce the power to detect genetic association, but the size of the effect is not known in next generation sequencing (NGS). The non-centrality parameter (NCP) and hence power of the association test allowing for errors for a specified error model at a base pair was found. This NCP was compared to the NCP for the usual chi-square test. The asymptotic power was compared to simulated power for specific settings of the true genotype and phenotype frequencies in the case and control populations, genotype misclassification rates, and total sample size. An R script was provided for calculating the NCP. Next, the effect of misclassification error using data from NGS technology for case-control genetic association studies was modeled. The Likelihood Ratio Test Allowing for Error using NGS data (LRTNGS) was derived. The estimated genotype frequencies and misclassification rates from the observed base pair reads were calculated using the expectation-maximization (EM) algorithm. This statistic allows for both non-differential and differential misclassification. The distribution of LRTNGS was studied by simulations for both null and alternative settings. The effects of genotyping misclassification rates on the sample size needed to maintain the constant asymptotic Type I and Type II error rates were studied. For at risk minor allele frequencies less than 0.01, large sample sizes were required for the asymptotic distribution to be a good approximation. Increasing the sequencing coverage increased the estimated power and the adequacy of simulated power. | 110 pages
Recommended Citation
Zhang, Ruiqi, "Modeling the effect of sequencing error" (2014). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 2439.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/2439