Estimation and group-feature selection in sparse mixture-of-experts with diverging number of parameters

被引:0
作者
Khalili, Abbas [1 ]
Yang, Archer Yi [1 ,2 ]
Da, Xiaonan [3 ]
机构
[1] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
[2] Mila Quebec AI Inst, Montreal, PQ, Canada
[3] Stat Canada, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Regularization; Variable selection; Mixture-of-experts; NONCONCAVE PENALIZED LIKELIHOOD; MAXIMUM-LIKELIHOOD; VARIABLE SELECTION; FINITE MIXTURE; REGRESSION-MODELS; EM ALGORITHM; IDENTIFIABILITY; REGULARIZATION;
D O I
10.1016/j.jspi.2024.106250
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Mixture-of-experts provide flexible statistical models for a wide range of regression (supervised learning) problems. Often a large number of covariates (features) are available in many modern applications yet only a small subset of them is useful in explaining a response variable of interest. This calls for a feature selection device. In this paper, we present new group- feature selection and estimation methods for sparse mixture-of-experts models when the number of features can be nearly comparable to the sample size. We prove the consistency of the methods in both parameter estimation and feature selection. We implement the methods using a modified EM algorithm combined with proximal gradient method which results in a convenient closed-form parameter update in the M-step of the algorithm. We examine the finite-sample performance of the methods through simulations, and demonstrate their applications in a real data example on exploring relationships in body measurements.
引用
收藏
页数:17
相关论文
共 48 条
  • [1] STATISTICAL GUARANTEES FOR THE EM ALGORITHM: FROM POPULATION TO SAMPLE-BASED ANALYSIS
    Balakrishnan, Sivaraman
    Wainwrightt, Martin J.
    Yu, Bin
    [J]. ANNALS OF STATISTICS, 2017, 45 (01) : 77 - 120
  • [2] VALID POST-SELECTION INFERENCE
    Berk, Richard
    Brown, Lawrence
    Buja, Andreas
    Zhang, Kai
    Zhao, Linda
    [J]. ANNALS OF STATISTICS, 2013, 41 (02) : 802 - 837
  • [3] Boyd S., 2004, Convex Optimization
  • [4] Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors
    Breheny, Patrick
    Huang, Jian
    [J]. STATISTICS AND COMPUTING, 2015, 25 (02) : 173 - 187
  • [5] Chamroukhi F, 2019, J SFDS, V160, P57
  • [6] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [7] IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE
    DONOHO, DL
    JOHNSTONE, IM
    [J]. BIOMETRIKA, 1994, 81 (03) : 425 - 455
  • [8] Fan J., 2020, STAT FDN DATA SCI
  • [9] Nonconcave penalized likelihood with a diverging number of parameters
    Fan, JQ
    Peng, H
    [J]. ANNALS OF STATISTICS, 2004, 32 (03) : 928 - 961
  • [10] Variable selection via nonconcave penalized likelihood and its oracle properties
    Fan, JQ
    Li, RZ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1348 - 1360