Type
Text
Type
Dissertation
Advisor
Wu, Song. | Zhu, Wei | Yang, Jie | Bahou, Wadie.
Date
2017-08-01
Keywords
differential expression | Biostatistics | paired data | Poisson distribution | RNA-seq
Department
Department of Applied Mathematics and Statistics.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree
Identifier
http://hdl.handle.net/11401/78213
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
Next generation sequencing (NGS) technology provides an attractive platform for genomic study. RNA-seq employs NGS technology to sequence and quantify RNA content in samples and reveal their gene expression profiles. In RNA-seq studies, one important objective is to identify the gene expression difference between two experimental conditions (e.g. control vs. treatment), which is known as differential expression (DE) analysis. Various statistical methods, such as edgeR and DESeq, have been developed to perform the two-sample DE analysis. However, in practice, expression data may come in pairs, e.g. | pre-vs. post-treatment on the same individual, and new models incorporating this paired structure are in great demand. In this thesis, we propose a new analysis framework that directly takes into account the paired structure of RNA-seq data and perform the paired DE analysis. Normalization is a crucial pre-processing step for DE analysis. However, none of the currently available normalization methods are designed for paired RNA-seq data. We investigated all existing normalization methods through a series of simulation studies to gain insights about their applicability. Based on these, a customized normalization method (pairedNorm) has been proposed for paired RNA-seq DE analysis. Regarding the statistical test, we adopt the Poisson model for the paired RNA-seq data and propose a conditional likelihood framework, named as pairedBN, for parameter estimation and hypothesis testing. Unlike the other DE tests, the proposed method does not assume distribution of baseline expression level across samples and has no restriction on proportion of DE genes within a sample. The conditional likelihood framework is employed to reduce the nuisance parameters, e.g. | the sample specific true expression levels, thus largely improving the computational efficiency. Furthermore, a non-parametric test procedure can serve as an ad-hoc procedure allowing for more flexibility of the data. We conduct an extensive comparison of our method (pairedBN) with two most popular methods, edgeR and DESeq, through simulation studies. The results show the superiority of pairedBN in FDR control while maintaining good sensitivity. We also apply our method to analyze a paired RNA-seq data from TCGA to demonstrate its practical usage. | 105 pages
Recommended Citation
Xu, Jianjin, "A Conditional Likelihood Based Model for Differential Expression Analysis for Paired RNA-seq Data" (2017). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3708.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3708