Twenty Years of Mixture of Experts

被引:358
作者
Yuksel, Seniha Esen [1 ]
Wilson, Joseph N. [1 ]
Gader, Paul D. [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Applications; Bayesian; classification; comparison; hierarchical mixture of experts (HME); mixture of Gaussian process experts; regression; statistical properties; survey; variational; TIME-SERIES PREDICTION; INDEPENDENT FACE RECOGNITION; SUPPORT VECTOR MACHINES; OF-EXPERTS; HIERARCHICAL MIXTURES; NEURAL-NETWORKS; EM ALGORITHM; ASYMPTOTIC NORMALITY; MAXIMUM-LIKELIHOOD; BAYESIAN-INFERENCE;
D O I
10.1109/TNNLS.2012.2200299
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we provide a comprehensive survey of the mixture of experts (ME). We discuss the fundamental models for regression and classification and also their training with the expectation-maximization algorithm. We follow the discussion with improvements to the ME model and focus particularly on the mixtures of Gaussian process experts. We provide a review of the literature for other training methods, such as the alternative localized ME training, and cover the variational learning of ME in detail. In addition, we describe the model selection literature which encompasses finding the optimum number of experts, as well as the depth of the tree. We present the advances in ME in the classification area and present some issues concerning the classification model. We list the statistical properties of ME, discuss how the model has been modified over the years, compare ME to some popular algorithms, and list several applications. We conclude our survey with future directions and provide a list of publicly available datasets and a list of publicly available software that implement ME. Finally, we provide examples for regression and classification. We believe that the study described in this paper will provide quick access to the relevant literature for researchers and practitioners who would like to improve or use ME, and that it will stimulate further studies in ME.
引用
收藏
页码:1177 / 1193
页数:17
相关论文
共 133 条
[91]   A mixture of feature experts approach for protein-protein interaction prediction [J].
Qi, Yanjun ;
Klein-Seetharaman, Judith ;
Bar-Joseph, Ziv .
BMC BIOINFORMATICS, 2007, 8 (Suppl 10)
[92]  
Ramamurti V, 1996, INT CONF ACOUST SPEE, P3569, DOI 10.1109/ICASSP.1996.550800
[93]  
Rasmussen CE, 2002, ADV NEUR IN, V14, P881
[94]  
Robert C.P. Casella., 1998, MONTE CARLO STAT MET
[95]  
Saito K, 1996, IEEE IJCNN, P1268, DOI 10.1109/ICNN.1996.549080
[96]   Deformable Model Fitting with a Mixture of Local Experts [J].
Saragih, Jason M. ;
Lucey, Simon ;
Cohn, Jeffrey F. .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :2248-2255
[97]  
SCHAPIRE RE, 1990, MACH LEARN, V5, P197, DOI 10.1023/A:1022648800760
[98]   Hierarchical Gaussian process mixtures for regression [J].
Shi, JQ ;
Murray-Smith, R ;
Titterington, DM .
STATISTICS AND COMPUTING, 2005, 15 (01) :31-41
[99]   Bayesian regression and classification using mixtures of Gaussian processes [J].
Shi, JQ ;
Murray-Smith, R ;
Titterington, DM .
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2003, 17 (02) :149-161
[100]   Classification of seismic signals by integrating ensembles of neural networks [J].
Shimshoni, Y ;
Intrator, N .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (05) :1194-1201