Determining the number of components in a factor model from limited noisy data

被引:138
作者
Kritchman, Shira [1 ]
Nadler, Boaz [1 ]
机构
[1] Weizmann Inst Sci, Dept Comp Sci & Appl Math, IL-76100 Rehovot, Israel
关键词
Pseudorank estimation; Principal component analysis; Random matrix theory; Tracy-Widom distribution; Number of components in a mixture;
D O I
10.1016/j.chemolab.2008.06.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Determining the number of components in a linear mixture model is a fundamental problem in many scientific fields, including chemometrics and signal processing. In this paper we present a new method to automatically determine the number of components from a limited number of (possibly) high dimensional noisy samples. The proposed method, based on the eigenvalues of the sample covariance matrix, combines a matrix perturbation approach for the interaction of signal and noise eigenvalues, with recent results from random matrix theory regarding the behavior of noise eigenvalues. We present the theoretical derivation of the algorithm and an analysis of its consistency and limit of detection. Results on simulated data show that under a wide range of conditions our method compares favorably with other common algorithms. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:19 / 32
页数:14
相关论文
共 39 条
[1]  
Anderson TW., 1984, INTRO MULTIVARIATE S
[2]   Eigenvalues of large sample covariance matrices of spiked population models [J].
Baik, Jinho ;
Silverstein, Jack W. .
JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (06) :1382-1408
[3]  
Chen ZP, 1999, J CHEMOMETR, V13, P15, DOI 10.1002/(SICI)1099-128X(199901/02)13:1<15::AID-CEM527>3.0.CO
[4]  
2-I
[5]   A rate of convergence result for the largest eigenvalue of complex white Wishart matrices [J].
El Karoui, Noureddine .
ANNALS OF PROBABILITY, 2006, 34 (06) :2077-2117
[6]   An automated procedure to predict the number of components in spectroscopic data [J].
Elbergali, A ;
Nygren, J ;
Kubista, M .
ANALYTICA CHIMICA ACTA, 1999, 379 (1-2) :143-158
[7]  
Faber K, 1997, J CHEMOMETR, V11, P53, DOI 10.1002/(SICI)1099-128X(199701)11:1<53::AID-CEM434>3.0.CO
[8]  
2-4
[9]   ASPECTS OF PSEUDORANK ESTIMATION METHODS BASED ON THE EIGENVALUES OF PRINCIPAL COMPONENT ANALYSIS OF RANDOM MATRICES [J].
FABER, NM ;
BUYDENS, LMC ;
KATEMAN, G .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1994, 25 (02) :203-226
[10]   Detection of signals by information theoretic criteria: General asymptotic performance analysis [J].
Fishler, E ;
Grosmann, M ;
Messer, H .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (05) :1027-1036