Novel Computational Methodology for Detecting and Quantifying Alternative Splicing from RNA-Seq data
Type
Text
Type
Dissertation
Advisor
Xing, Haipeng | Zhang, Michael Q | Zhu, Wei | Krainer, Adrian.
Date
2015-08-01
Keywords
Alternative splicing, Next generation sequencing, RNA-Seq | Bioinformatics
Department
Department of Applied Mathematics and Statistics.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/76393
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
Recent development of ultra-high-throughput sequencing of the transcriptome (mRNA-Seq) provides a means of profiling RNA splicing events at unprecedented depth. On the other hand, the ultra-high coverage and the complexity brought by mRNA-Seq data also create big challenges for computational analysis. My Ph.D. work focuses on developing algorithms to detect, quantify and characterize alternative splicing (AS) from mRNA-Seq data. These algorithms include: (1) OLego, a fast and sensitive splice mapping program for mRNA-Seq data. The most important features of OLego include strategic and efficient searches with very small seeds (12~14 nt), and a built-in regression model to score exon junctions. In addition, OLego does not require any external mapper, and is implemented in C++ with full support of multithreading. As a consequence, OLego has improved sensitivity on junction and exon discovery while keeping high accuracy and speed. (2) In-house scripts to identify AS events from alignment results of mRNA-Seq data. Instead of constructing full structures of the transcripts, this approach identifies exons and AS events from the junction reads directly to achieve lower complexity and higher sensitivity of splicing events. (3) SpliceTrap, a method to quantify exon inclusion ratios from paired end mRNA-Seq data using a Bayesian model. The algorithm solves the splicing problem by looking at local splicing events instead of the whole transcripts, which enables quantification of exon inclusion ratios without knowing the complete transcript structure. It also utilizes prior information including fragment size distribution and inclusion ratio models from highly covered AS events. All of the programs above are splicing-centric tools and can be used to study AS events with high resolution and sensitivity. We have applied this pipeline on many real dataset including the BodyMap 2.0 data, in which we identified 120,110 cassette exons in human genome, including 82,528 novel cassette exon events. Strikingly, we identified over 2,000 cassette micro-exons smaller than 27 nt, 105 of them have a length of 6 nt. Because of the minimal information that can be possibly encoded in this set of exons, they serve as an excellent model to study their functional significance and mechanism of AS regulation. | 137 pages
Recommended Citation
Wu, Jie, "Novel Computational Methodology for Detecting and Quantifying Alternative Splicing from RNA-Seq data" (2015). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 2316.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/2316