Authors

Hyejoo Lee

Type

Text

Type

Dissertation

Advisor

Finch, Stephen | Ahn, Hongshik | Xing, Haipeng | Hong, Sangjin.

Date

2016-12-01

Keywords

Statistics

Department

Department of Applied Mathematics and Statistics

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77177

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

The purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs. | The purpose of this study is to develop a statistical model to predict the risk for developing disease. In order to enrich our general understanding of schizophrenia disorder, several clustering techniques are used as a preliminary study. Schizophrenia is a heterogeneous decease with great variability in symptoms, cognition, biology and course of illness. Some of this variability may be explained by latent subgroups that differ in etiology and key features. Individuals with paternal age related schizophrenia (PARS) may represent such a subgroup as evidence suggests a distinct symptom profile. Using K-means and hierarchical clustering on a large sample of schizophrenia patients, this study examines demographic, clinical and the distinctiveness of latent PARS subgroups. Despite the wide use of K-means clustering, there remain several issues about how best to implement it. One of the main problems in K-means clustering is how to determine the number of clusters in a data set. We propose to develop a method for choosing the optimal number of clusters. The performance of the proposed method is compared to other existing methods by simulation experiments. In this study, the performance of several classification models with the same schizophrenia data set is evaluated. Four predictive classification models including Random Forest (RF), Support Vector Machines (SVM), Linear Discriminant Analysis and Adaboost are trained and their performances are compared. These models are then used to predict a patient who might have more risk of developing schizophrenia. For RF and SVM, adjusted decision threshold is used for a fair comparison. One of the most critical factors in medical diagnosis is individual’s condition to a given disease which varies from one to another. It is difficult to make appropriate medical decision about treatment that works on every patient. This study focuses on to develop a statistical method to classify the data into these two groups: ones that have a risk at potential disease and others who don’t. The successful completion of this study will lead to dramatic improvement in the medical diagnosis which will help the development of decision support system and personalized treatments that focus on specific patient needs. | 135 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.