Kernel spectral clustering of large dimensional data

被引:49
作者
Couillet, Romain [1 ]
Benaych-Georges, Florent [2 ]
机构
[1] Univ ParisSud, LSS, CentraleSupelec, Gif Sur Yvette, France
[2] Univ Paris 05, UMR CNRS 8145, MAP 5, Paris, France
关键词
Kernel methods; spectral clustering; random matrix theory; RANDOM MATRICES; RANK PERTURBATIONS; LARGEST EIGENVALUE; SINGULAR-VALUES; FLUCTUATIONS; ESTIMATOR;
D O I
10.1214/16-EJS1144
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article proposes a first analysis of kernel spectral clustering methods in the regime where the dimension p of the data vectors to be clustered and their number n grow large at the same rate. We demonstrate, under a k-class Gaussian mixture model, that the normalized Laplacian matrix associated with the kernel matrix asymptotically behaves similar to a so-called spiked random matrix. Some of the isolated eigenvalue-eigenvector pairs in this model are shown to carry the clustering information upon a separability condition classical in spiked matrix models. We evaluate precisely the position of these eigenvalues and the content of the eigenvectors, which unveil important (sometimes quite disruptive) aspects of kernel spectral clustering both from a theoretical and practical standpoints. Our results are then compared to the actual clustering performance of images from the MNIST database, thereby revealing an important match between theory and practice.
引用
收藏
页码:1393 / 1454
页数:62
相关论文
共 33 条
[1]  
[Anonymous], ELECT COMMUN PROBAB
[2]  
Bai ZD, 1998, ANN PROBAB, V26, P316
[3]   Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices [J].
Baik, J ;
Ben Arous, G ;
Péché, S .
ANNALS OF PROBABILITY, 2005, 33 (05) :1643-1697
[4]   Large deviations of the extreme eigenvalues of random deformations of matrices [J].
Benaych-Georges, F. ;
Guionnet, A. ;
Maida, M. .
PROBABILITY THEORY AND RELATED FIELDS, 2012, 154 (3-4) :703-751
[5]   Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices [J].
Benaych-Georges, F. ;
Guionnet, A. ;
Maida, M. .
ELECTRONIC JOURNAL OF PROBABILITY, 2011, 16 :1621-1662
[6]  
BENAYCH-GEORGES F., 2016, ESAIM PROBABILITY ST
[7]  
BENAYCH-GEORGES F., 2016, SMF IN PRESS
[8]   The singular values and vectors of low rank perturbations of large rectangular random matrices [J].
Benaych-Georges, Florent ;
Nadakuditi, Raj Rao .
JOURNAL OF MULTIVARIATE ANALYSIS, 2012, 111 :120-135
[9]   The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices [J].
Benaych-Georges, Florent ;
Nadakuditi, Raj Rao .
ADVANCES IN MATHEMATICS, 2011, 227 (01) :494-521
[10]  
Bishop C., 2006, Pattern recognition and machine learning, P423