Authors

Yu-Chuan Chen

Type

Text

Type

Dissertation

Advisor

Ahn, Hongshik | Zhu, Wei | Wu, Song | Zhou, Yiyi.

Date

2014-12-01

Keywords

Statistics | Canonical linear discriminant analysis, Classification, Ensemble, Linear discriminant analysis, Rotation Forest

Department

Department of Applied Mathematics and Statistics.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77577

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

In this dissertation, we propose a new classification ensemble method named Canonical Forest. This new ensemble method uses canonical linear discriminant analysis (CLDA) and bootstrap resampling method to create more accurate and diverse classifiers in an ensemble. Although CLDA is commonly used for dimension reduction, we note here CLDA serves as a linear transformation tool rather than a dimension reduction tool. Since CLDA will find the transformed space that separates the classes farther in distribution, classifiers built on this space will be more accurate than those on the original space. To further diversify the classifiers in an ensemble, CLDA is applied only on a partial mutually exclusive feature space for each bootstrap sample. To compare the performance of Canonical Forest and other widely used ensemble methods including Bagging, Adaboost, Samme, Random Forest, and Rotation Forest, we tested them on 29 real or artificial data sets. In addition to the classification accuracy, we also investigated the diversity and the bias and variance decomposition of each ensemble method. Because Canonical Forest cannot be applied to high-dimensional data directly, we propose another version of Canonical Forest called High-Dimensional Canonical Forest (HDCF) that is specifically designed for the high-dimensional data. By implementing the algorithm of Random Subspace into Canonical Forest, we can naturally apply Canonical Forest to high-dimensional data without performing feature selection or feature reduction first. We compared the performance of HDCF with some current popular high-dimensional classification algorithms including SVM, CERP, and Random Forest using gene imprinting, estrogen and leukemia data sets. | 122 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.