Authors

Xiao Wu

Type

Text

Type

Dissertation

Advisor

Wei Zhu. | Haipeng Xing | Xiangmin Jiao | Daniel van der Lelie | Safiyh Taghavi | Ellen Li.

Date

2011-08-01

Keywords

Biostatistics -- Statistics

Department

Department of Applied Mathematics and Statistics

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/71731

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

This thesis features a novel theoretical development, as well as a novel application of the structural equation modeling (SEM) framework for biological pathway and biological measurement platform comparisons respectively. For the SEM methodology development, we have extended the covariate structural equation modeling (cSEM) method (Sharpe, 2010) for pathway comparisons that was limited to continuous variables on the pathway nodes and categorical variables as pathway covariates only, to allow both continuous and categorical variables as pathway nodes as well as pathway covariates. This novel mixed variable cSEM method will permit researchers to implement a pathway with both continuous variables such as gene expression levels, and categorical variables such as genotypes on the pathway nodes, and compare the pathway between different groups (diseased, normal etc.) as well as evaluate the impact of continuous variables such as age on the pathway links (i.e. connecting patterns and strengths). Culture-independent phylogenetic analysis of 16S ribosomal RNA gene sequences has emerged as an incisive method of identifying bacteria present in a specimen. However multiple competing measurement platforms are often available to enumerate the abundances of the bacteria, including Sanger sequencing, pyrosequencing, and quantitative PCR. Here we present a novel application of the latent variable SEM to estimate the reliabilities of, and the similarities between different measurement platforms, and subsequently, weigh these measures optimally for a unified analysis of the true latent microbiome composition. The latent variable SEM contains the usual repeated measures ANCOVA as special cases and, as a more general, realistic and optimal model, features superior model goodness-of-fit as well as more reliable analysis results. The third and final contribution of this thesis is the establishment of two bioinformatics pipelines in a systems biology framework to integrate incremental biological knowledge obtained through the analysis of newly available experimental data, to existing biological knowledge database, and subsequently evolve such knowledgebase to the next level. Two examples, one from the molecular study of the human inflammatory bowel diseases, and one from the study of endophytic bacteria known to impact the growth rate of certain plant, are provided to illustrate these novel pipelines.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.