A Non-asymptotic Risk Bound for Model Selection in a High-Dimensional Mixture of Experts via Joint Rank and Variable Selection

被引:0
作者
TrungTin Nguyen [1 ]
Dung Ngoc Nguyen [2 ]
Hien Duy Nguyen [3 ]
Chamroukhi, Faicel [4 ]
机构
[1] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP,LJK,Inria Grenoble Rhone Alpes, 655 Av IEurope, F-38335 Grenoble, France
[2] Univ Padua, Dept Stat Sci, Padua, Italy
[3] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[4] IRT SystemX, Palaiseau, France
来源
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II | 2024年 / 14472卷
关键词
Dimensionality reduction; Low rank estimation; Mixture of experts; Finite mixture regression; Non-asymptotic model selection; Oracle inequality; Variable selection; MAXIMUM-LIKELIHOOD; MINIMAL PENALTIES; REGRESSION;
D O I
10.1007/978-981-99-8391-9_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We are motivated by the problem of identifying potentially nonlinear regression relationships between high-dimensional outputs and high-dimensional inputs of heterogeneous data. This requires regression, clustering, and model selection, simultaneously. In this framework, we apply the mixture of experts models which are among the most popular ensemble learning techniques developed in the field of neural networks. In particular, we consider a more general case of mixture of experts models characterized by multiple Gaussian experts whose means are polynomials of the input variables and whose covariance matrices have block-diagonal structures. More especially, each expert is weighted by a gating network that is a softmax function of a polynomial of the input variables. These models require several hyper-parameters, including the number of mixture components, the complexity of the softmax gating networks and Gaussian mean experts, and the hidden block-diagonal structures of the covariance matrices. We provide a non-asymptotic theory for model selection of such complex hyper-parameters using the slope heuristic approach in a penalized maximum likelihood estimation framework. Specifically, we establish a non-asymptotic risk bound on the penalized maximum likelihood estimation, which takes the form of an oracle inequality, given lower bound assumptions on the penalty function.
引用
收藏
页码:234 / 245
页数:12
相关论文
共 33 条
[21]   A NON ASYMPTOTIC PENALIZED CRITERION FOR GAUSSIAN MIXTURE MODEL SELECTION [J].
Maugis, Cathy ;
Michel, Bertrand .
ESAIM-PROBABILITY AND STATISTICS, 2011, 15 :41-68
[22]  
Mazumder R, 2012, J MACH LEARN RES, V13, P781
[23]   On Convergence Rates of Mixtures of Polynomial Experts [J].
Mendes, Eduardo F. ;
Jiang, Wenxin .
NEURAL COMPUTATION, 2012, 24 (11) :3025-3051
[24]   Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach [J].
Montuelle, L. ;
Le Pennec, E. .
ELECTRONIC JOURNAL OF STATISTICS, 2014, 8 :1661-1695
[25]  
Nguyen H.D., 2021, Journal of Statistical Distributions and Applications, V8, P13
[26]   Approximation results regarding the multiple-output Gaussian gated mixture of linear experts model [J].
Nguyen, Hien D. ;
Chamroukhi, Faicel ;
Forbes, Florence .
NEUROCOMPUTING, 2019, 366 :208-214
[27]   Practical and theoretical aspects of mixture-of-experts modeling: An overview [J].
Nguyen, Hien D. ;
Chamroukhi, Faicel .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 8 (04)
[28]  
Nguyen T., 2022, Commun. Stat.-Theory Methods, V52, P1
[29]  
Nguyen T, 2024, Arxiv, DOI arXiv:2009.10622
[30]   ESTIMATING DIMENSION OF A MODEL [J].
SCHWARZ, G .
ANNALS OF STATISTICS, 1978, 6 (02) :461-464