A Non-asymptotic Risk Bound for Model Selection in a High-Dimensional Mixture of Experts via Joint Rank and Variable Selection

被引:0
作者
TrungTin Nguyen [1 ]
Dung Ngoc Nguyen [2 ]
Hien Duy Nguyen [3 ]
Chamroukhi, Faicel [4 ]
机构
[1] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP,LJK,Inria Grenoble Rhone Alpes, 655 Av IEurope, F-38335 Grenoble, France
[2] Univ Padua, Dept Stat Sci, Padua, Italy
[3] Univ Queensland, Sch Math & Phys, Brisbane, Qld, Australia
[4] IRT SystemX, Palaiseau, France
来源
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II | 2024年 / 14472卷
关键词
Dimensionality reduction; Low rank estimation; Mixture of experts; Finite mixture regression; Non-asymptotic model selection; Oracle inequality; Variable selection; MAXIMUM-LIKELIHOOD; MINIMAL PENALTIES; REGRESSION;
D O I
10.1007/978-981-99-8391-9_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We are motivated by the problem of identifying potentially nonlinear regression relationships between high-dimensional outputs and high-dimensional inputs of heterogeneous data. This requires regression, clustering, and model selection, simultaneously. In this framework, we apply the mixture of experts models which are among the most popular ensemble learning techniques developed in the field of neural networks. In particular, we consider a more general case of mixture of experts models characterized by multiple Gaussian experts whose means are polynomials of the input variables and whose covariance matrices have block-diagonal structures. More especially, each expert is weighted by a gating network that is a softmax function of a polynomial of the input variables. These models require several hyper-parameters, including the number of mixture components, the complexity of the softmax gating networks and Gaussian mean experts, and the hidden block-diagonal structures of the covariance matrices. We provide a non-asymptotic theory for model selection of such complex hyper-parameters using the slope heuristic approach in a penalized maximum likelihood estimation framework. Specifically, we establish a non-asymptotic risk bound on the penalized maximum likelihood estimation, which takes the form of an oracle inequality, given lower bound assumptions on the penalty function.
引用
收藏
页码:234 / 245
页数:12
相关论文
共 33 条
[1]   Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks [J].
Anderson, CW ;
Stolz, EA ;
Shamsunder, S .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1998, 45 (03) :277-286
[2]  
Arlot S, 2019, J SFDS, V160, P1
[3]  
Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
[4]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[5]   Minimal penalties for Gaussian model selection [J].
Birge, Lucien ;
Massart, Pascal .
PROBABILITY THEORY AND RELATED FIELDS, 2007, 138 (1-2) :33-73
[6]  
Borwein J.M., 2004, Canadian Mathematical Society Books in Maths, DOI DOI 10.1007/0-387-28271-8
[7]  
Chamroukhi F, 2018, IEEE IJCNN
[8]  
Chen Zheng, 2022, NEURIPS
[9]  
Cohen S., 2011, Technical report
[10]   PARTITION-BASED CONDITIONAL DENSITY ESTIMATION [J].
Cohen, S. X. ;
Le Pennec, E. .
ESAIM-PROBABILITY AND STATISTICS, 2013, 17 :672-697