Finite mixture regression: A sparse variable selection by model selection for clustering

被引:17
作者
Devijver, Emilie [1 ]
机构
[1] Univ Paris Saclay, CNRS, Univ Paris 11, Lab Math Orsay, F-91405 Orsay, France
关键词
Variable selection; finite mixture regression; non-asymptotic penalized criterion; l(1)-regularized method; LASSO; RATES;
D O I
10.1214/15-EJS1082
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider a finite mixture of Gaussian regression models for high-dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted on relevant variables selected by an l(1)-penalized maximum likelihood estimator. We get an oracle inequality satisfied by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle inequality is deduced from a general model selection theorem for maximum likelihood estimators on a random model subcollection. We can derive the penalty shape of the criterion, which depends on the complexity of the random model collection.
引用
收藏
页码:2642 / 2674
页数:33
相关论文
共 25 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2012, RES REPORT
[3]   GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE [J].
Baraud, Yannick ;
Giraud, Christophe ;
Huet, Sylvie .
ANNALS OF STATISTICS, 2009, 37 (02) :630-672
[4]   Least squares after model selection in high-dimensional sparse models [J].
Belloni, Alexandre ;
Chernozhukov, Victor .
BERNOULLI, 2013, 19 (02) :521-547
[5]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[6]   Minimal penalties for Gaussian model selection [J].
Birge, Lucien ;
Massart, Pascal .
PROBABILITY THEORY AND RELATED FIELDS, 2007, 138 (1-2) :33-73
[7]  
Cohen S., 2011, RR7596
[8]  
Devijver E., 2014, ARXIV14091333
[9]  
Ferraty F., 2006, SPR S STAT
[10]  
Genovese CR, 2000, ANN STAT, V28, P1105