Authors

Yang Liu

Type

Text

Type

Dissertation

Advisor

Wu, Song | Yang, Jie | Galambos, Nora. | Zhu, Wei

Date

2016-12-01

Keywords

Statistics

Department

Department of Applied Mathematics and Statistics

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77221

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

Next generation sequencing (NGS) technology has been widely used in biomedical research, particularly on those genomics-related studies. One of the NGS applications is high-throughput mRNA sequencing (RNA-seq), which is usually applied to discover alternative splicing events, to evaluate gene expression level and to identify differentially expressed genes. Compared with the traditional microarrays, RNA-seq is more efficient and economical. Currently, many useful software tools have been developed for RNA-seq differential expression (DE) analyses, such as edgeR, DESeq and Cufflinks; however, all these methods either ignore the isoforms of mRNA transcript, or rely on the predefined isoform structures, or depend on the De Novo isoform reconstruction from the sequencing data, which lead to less accurate inference. In this thesis, we developed and implemented a novel splicing-graph based negative binomial (SGNB) model for gene differential expression analysis in RNA-seq data. The principle of our model is to change the expression comparisons from the unobservable transcript level to the observable read type level, according to the fundamental theory of the linear algebra. The likelihood ratio test is used for finding DE genes. Computationally, we employed the expectation-maximization (EM) and the Newton-Raphson algorithms for parameter estimation. The main advantage of our model is that it considers the isoform but does not require the pre-defined isoform structure and therefore is expected to be more robust and powerful. At the same time, our method does not ask for the De Novo procedure, which will save the time and avoid errors in reconstructing isoforms. We performed intensive simulations to compare our new method with one of the most popular package, edgeR. Under various scenarios we examined, the results showed that our new model can achieve better power, while correctly controlling the false discovery rate. We also applied our method to a real data set to demonstrate its applicability in practice. | 90 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.