Authors

Shuchu Han

Type

Text

Type

Dissertation

Advisor

Qin, Hong. | Qin, Hong | Wang, Fusheng | Samaras, Dimitris | Orabona, Francesco | Harrison, Robert J.

Date

2017-08-01

Keywords

data mining | Computer science | graph theory | machine learning | sparse graph

Department

Department of Computer Science.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree

Identifier

http://hdl.handle.net/11401/78141

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

The structure of real-world data (in the form of feature matrix) includes crucial information relevant to the performance of machine learning and data mining algorithms. The structure could be local manifold structure, global structure or discriminative information based on the requirements of learning or mining tasks. To model this intrinsic structure, an effective graph representation like k-nearest neighbor graph is necessary. Considering the increasing data size in this digital era, efficient sparse graph representations without parameter tuning are very demanding. In this thesis, we build novel sparse and nonparametric graph representation algorithms for unsupervised learning. The theory foundation of our research works is the similarity graph of Sparse Subspace Clustering. Our research works focus on: (1) alleviate the negative impacts of losing subspace structure assumption about the data: remove non-local edges and generate consistent edge connections, (2) solve the scalability issue for large size data: apply greedy algorithm with ranked dictionaries, (3) applications in unsupervised learning: redundant feature removal for high dimensional data. Moreover, this thesis includes graph structure analysis which connects to the quality of graph following Dense Subgraph theory: (1) data label estimation through dense subgraphs for semi-supervised learning, (2) graph robustness which can differentiate the topology and scale of subgraphs. | 170 pages

Share

COinS