Type
Text
Type
Dissertation
Advisor
Zhu, Wei | Krasnitz, Alexander | Finch, Stephen | Yoon, Seungtai.
Date
2014-05-01
Keywords
Clustering, Hierarchical, Randomizations, TBEST | Statistics
Department
Department of Applied Mathematics and Statistics.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/77826
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf | application/vnd.ms-excel
Abstract
One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. With each of the five datasets, there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. One dataset uses Cores Of Recurrent Events (CORE) to select features. CORE was developed with my participation in the course of this work. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/CORE/index.html . Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html . | 91 pages
Recommended Citation
Sun, Guoli, "Significant distinct branches of hierarchical trees: A framework for statistical analysis and applications to biological data" (2014). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3594.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3594