A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data

被引:124
作者
Peng, Xi [1 ]
Tang, Huajin [2 ]
Zhang, Lei [2 ]
Yi, Zhang [2 ]
Xiao, Shijie [3 ]
机构
[1] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[2] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Error bound analysis; least square regression (LSR); low-rank representation (LRR); out-of-sample problem; scalable subspace clustering; sparse subspace clustering (SSC); SPARSE REPRESENTATION; COLLABORATIVE REPRESENTATION; RANK REPRESENTATION; FACE RECOGNITION; SPECTRAL METHODS; SEGMENTATION; KERNEL;
D O I
10.1109/TNNLS.2015.2490080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l(2)-norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.
引用
收藏
页码:2499 / 2512
页数:14
相关论文
共 60 条
[1]  
Alimoglu F, 1997, PROC INT CONF DOC, P637, DOI 10.1109/ICDAR.1997.620583
[2]  
[Anonymous], 0749 U MASS COLL INF
[3]  
[Anonymous], 1998, 24 CVC AUT U BARC
[4]  
[Anonymous], 1990, Introduction to statistical pattern recognition
[5]  
[Anonymous], 2010, ANAL SOCIAL MEDIA NE
[6]  
[Anonymous], 2010, UCBEECS201013
[7]  
[Anonymous], 2008, PROC IEEE C COMPUT V
[8]   Spectral methods in machine learning and new strategies for very large datasets [J].
Belabbas, Mohamed-Ali ;
Wolfe, Patrick J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (02) :369-374
[9]   Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables [J].
Blackard, JA ;
Dean, DJ .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1999, 24 (03) :131-151
[10]   Document clustering using locality preserving indexing [J].
Cai, D ;
He, XF ;
Han, JW .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) :1624-1637