Type

Text

Type

Dissertation

Advisor

Wu, Song | Yang, Jie | Galambos, Nora. | Zhu, Wei

Date

2016-12-01

Keywords

Statistics

Department

Department of Applied Mathematics and Statistics

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77221

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

Next generation sequencing (NGS) technology has been widely used in biomedical research, particularly on those genomics-related studies. One of the NGS applications is high-throughput mRNA sequencing (RNA-seq), which is usually applied to discover alternative splicing events, to evaluate gene expression level and to identify differentially expressed genes. Compared with the traditional microarrays, RNA-seq is more efficient and economical. Currently, many useful software tools have been developed for RNA-seq differential expression (DE) analyses, such as edgeR, DESeq and Cufflinks; however, all these methods either ignore the isoforms of mRNA transcript, or rely on the predefined isoform structures, or depend on the De Novo isoform reconstruction from the sequencing data, which lead to less accurate inference. In this thesis, we developed and implemented a novel splicing-graph based negative binomial (SGNB) model for gene differential expression analysis in RNA-seq data. The principle of our model is to change the expression comparisons from the unobservable transcript level to the observable read type level, according to the fundamental theory of the linear algebra. The likelihood ratio test is used for finding DE genes. Computationally, we employed the expectation-maximization (EM) and the Newton-Raphson algorithms for parameter estimation. The main advantage of our model is that it considers the isoform but does not require the pre-defined isoform structure and therefore is expected to be more robust and powerful. At the same time, our method does not ask for the De Novo procedure, which will save the time and avoid errors in reconstructing isoforms. We performed intensive simulations to compare our new method with one of the most popular package, edgeR. Under various scenarios we examined, the results showed that our new model can achieve better power, while correctly controlling the false discovery rate. We also applied our method to a real data set to demonstrate its applicability in practice. | 90 pages

Recommended Citation

Liu, Yang, "An Isoform-free Model for Differential Expression Analysis in RNA-seq Data" (2016). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3050.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3050

Download

COinS

Academic Commons

Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions)

An Isoform-free Model for Differential Expression Analysis in RNA-seq Data

Type

Type

Advisor

Date

Keywords

Department

Language

Source

Identifier

Publisher

Format

Abstract

Recommended Citation

Browse

Search

Author Corner

Academic Commons

Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions)

An Isoform-free Model for Differential Expression Analysis in RNA-seq Data

Authors

Type

Type

Advisor

Date

Keywords

Department

Language

Source

Identifier

Publisher

Format

Abstract

Recommended Citation

Share

Browse

Search

Author Corner