Type
Text
Type
Dissertation
Advisor
Zhu, Wei | Gao, Yi | Kuan, Pei-Fen | Bahou, Wadie.
Date
2016-12-01
Keywords
barcode, bar-seq, clustering, ET, miRNA-mRNA regulatory network, Sparse modeling | Bioinformatics -- Statistics
Department
Department of Applied Mathematics and Statistics
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/77410
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
This thesis consists of two topics: (1) discovery of microRNA/mRNA regulatory networks on essential thrombocytosis (ET), and (2) a novel ultrafast clustering algorithm to count nucleotide barcode and amplicon reads with errors. The objective of the first study is to discover miRNA-mRNA regulatory networks related to ET, a chronic myeloproliferative disorder with an unregulated surplus of platelets. Complications of ET include stroke, heart attack, and formation of blood clots. While the genetic basis of ET has been studied to some extent, no direct diagnostic test is available to date. In this study, we aim to identify novel ET-related miRNA-mRNA regulatory networks through comparisons of transcriptomes between healthy control and ET patients. Four network discovery algorithms have been employed, including (a) Pearson correlation network, (b) sparse supervised canonical correlation analysis (sparse sCCA), (c) sparse partial correlation network analysis (SPACE), and, (d) (sparse) Bayesian network analysis – all through a combination of data-driven and knowledge-based analyses. The result predicts a close relationship between 8 miRNAs (including miR-9, miR-490-5p, miR-490-3p, miR-182, miR-34a, miR-196b, miR-34b*, miR-181a-2*) and a 9-mRNA set (including CAV2, LAPTM4B, TIMP1, PKIG, WASF1, MMP1, ERVH-4, NME4, HSD17B12). The majority of the identified variables have been linked to hematologic function by a sizable number of studies. Furthermore, it is observed that the selected mRNAs are high relevant to ET disease. The study will shed light on understanding the etiology of ET. The objective of the second study is to develop an ultrafast and accurate clustering algorithm and software to detect barcodes, certain DNA sequences, and their abundances from raw next-generation barcode sequencing (bar-seq) data. Although bar-seq use has been quickly growing, the computational pipelines for its analyses have not been well developed. Available methods are slow and often result in over-clustering artifacts that group distinct barcodes together. Here, we developed a software package called Bartender, which employs a divide-and-conquer strategy for fast implementation and a modified two-sample proportion test for cluster merging. Additionally, Bartender includes a “multiple time point†mode that matches barcode clusters between different clustering runs for seamless handling of time course data. For both simulated and real data, Bartender clusters millions of unique barcodes in a few minutes at high accuracy (>99.9%), and is ~100-fold faster than previous methods. | 103 pages
Recommended Citation
Zhao, Lu, "On miRNA-mRNA network extraction and ultra-fast nucleotide barcodes clustering algorithm" (2016). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3225.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3225