Type

Text

Type

Dissertation

Advisor

Wu, Song | Zhu, Wei, Ahn, Hongshik | Kotov, Roman.

Date

2012-05-01

Keywords

Statistics

Department

Department of Applied Mathematics and Statistics

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/71500

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

The partial correlation is well defined for continuous data and popularly used in network analysis. Its strength is in its interpretation as the relationship between two variables after removing the effects of other variables. We follow up on a recent proposal of such a measure for categorical data, but the properties of which were not well studied. The new partial correlation is defined as the first canonical correlation of Pearson residuals from logistic regressions. This is analogous to the continuous case, where the partial correlation is obtained from correlating residuals from linear regressions. A simulation study is presented to examine the properties of the new partial correlation and compare it to other measures, such as the partial phi coefficient. In the limiting case, the new partial correlation and the partial phi coefficient converge in estimate and inference. However, the partial phi coefficient cannot be applied to multi-categorical data. Furthermore, it is not an efficient measure to control for more than one variable. The new partial correlation is well defined for the multi-categorical case and can readily control for more than one variable. Being derived as the canonical correlation, the new partial correlation can also measure the relationship between continuous and categorical variables as the multiple correlation between the Pearson residuals from the logistic regression and the usual residual from the linear regression when the response variables are categorical and continuous respectively. Now that we are fully capable of obtaining partial correlation networks for any data types, continuous, categorical or mixed, our next goal is to compare the network structure between different groups and to examine the impact of continuous, in addition to categorical covariates, on the pathway connections. This is accomplished by extending the two-level regression approach for continuous data originally developed by our research group (Pradhan, 2009) to categorical data and mixed data network analysis. By linearly regressing the first canonical variates and replacing the slope coefficient with an expression of the covariates, we can test for the effect of covariates (both categorical and continuous) on the partial correlation and the network structure. This new covariate partial correlation network analysis approach is illustrated through two studies on the links between human genotypes (single-nucleotide polymorphisms) and disease phenotypes. | 141 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.