Authors

Ruiqi Zhang

Type

Text

Type

Dissertation

Advisor

Mendell, Nancy | Finch, Stephen J. | Zhu, Wei | Gordon, Derek.

Date

2014-12-01

Keywords

Statistics

Department

Department of Applied Mathematics and Statistics.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/76538

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

Genotype misclassification errors are known to reduce the power to detect genetic association, but the size of the effect is not known in next generation sequencing (NGS). The non-centrality parameter (NCP) and hence power of the association test allowing for errors for a specified error model at a base pair was found. This NCP was compared to the NCP for the usual chi-square test. The asymptotic power was compared to simulated power for specific settings of the true genotype and phenotype frequencies in the case and control populations, genotype misclassification rates, and total sample size. An R script was provided for calculating the NCP. Next, the effect of misclassification error using data from NGS technology for case-control genetic association studies was modeled. The Likelihood Ratio Test Allowing for Error using NGS data (LRTNGS) was derived. The estimated genotype frequencies and misclassification rates from the observed base pair reads were calculated using the expectation-maximization (EM) algorithm. This statistic allows for both non-differential and differential misclassification. The distribution of LRTNGS was studied by simulations for both null and alternative settings. The effects of genotyping misclassification rates on the sample size needed to maintain the constant asymptotic Type I and Type II error rates were studied. For at risk minor allele frequencies less than 0.01, large sample sizes were required for the asymptotic distribution to be a good approximation. Increasing the sequencing coverage increased the estimated power and the adequacy of simulated power. | 110 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.