Exact dimensionality selection for Bayesian PCA

被引:12
作者
Bouveyron, Charles [1 ]
Latouche, Pierre [2 ]
Mattei, Pierre-Alexandre [1 ]
机构
[1] Univ Cote Azur, UMR CNRS 7135 & Inria Maasai Team, Laboratoire J A Dieudonne, Sophia Antipolis, France
[2] Univ Paris 05, Sorbonne Paris Cite, Laboratoire MAP5, UMR CNRS 8045, Paris, France
关键词
bayesian model selection; dimension reduction; marginal likelihood; principal Component Analysis; Singular value decomposition; VARIABLE SELECTION; NUMBER; COMPONENTS; MODELS; MATRIX; LIKELIHOOD;
D O I
10.1111/sjos.12424
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In nonasymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods.
引用
收藏
页码:196 / 211
页数:16
相关论文
共 53 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Bayesian factor analysis with fat-tailed factors and its exact marginal likelihood [J].
Ando, Tomohiro .
JOURNAL OF MULTIVARIATE ANALYSIS, 2009, 100 (08) :1717-1726
[3]  
[Anonymous], 2011, Adv. Neural Inform. Processing Systems
[4]  
[Anonymous], 2001, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance
[5]  
Archambeau Cedric., 2009, P 21 INT C NEUR INF, P73
[6]   Determining the number of factors in approximate factor models [J].
Bai, JS ;
Ng, S .
ECONOMETRICA, 2002, 70 (01) :191-221
[7]   CONSISTENCY OF AIC AND BIC IN ESTIMATING THE NUMBER OF SIGNIFICANT COMPONENTS IN HIGH-DIMENSIONAL PRINCIPAL COMPONENT ANALYSIS [J].
Bai, Zhidong ;
Choi, Kwok Pui ;
Fujikoshi, Yasunori .
ANNALS OF STATISTICS, 2018, 46 (03) :1050-1076
[8]  
Bayesian P. C. A., 1999, ADV NEURAL INFORM PR, P382
[9]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[10]   Bayesian variable selection for globally sparse probabilistic PCA [J].
Bouveyron, Charles ;
Latouche, Pierre ;
Mattei, Pierre-Alexandre .
ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02) :3036-3070