Improved learning algorithms for mixture of experts in multiclass classification

被引：91

作者：

Chen, K

Xu, L ^{[1
]}

Chi, H

机构：

[1] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China

[2] Peking Univ, Ctr Informat Sci, Beijing 100871, Peoples R China

[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

NEURAL NETWORKS | 1999年 / 12卷 / 09期

基金：

美国国家科学基金会;

关键词：

mixture of experts; multiclass classification; multinomial density; generalized Bernoulli density; Expectation-Maximization (EM) algorithm; Newton-Raphson method; iterative reweighted least squares (IRLS) algorithm; BFGS algorithm;

D O I：

10.1016/S0893-6080(99)00043-X

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mixture of experts (ME) is a modular neural network architecture for supervised learning. A double-loop Expectation-Maximization (EM) algorithm has been introduced to the ME architecture for adjusting the parameters and the iteratively reweighted least squares (IRLS) algorithm is used to perform maximization in the inner loop [Jordan, M.L, Jacobs, R.A. (1994). Hierarchical mixture of experts and the EM algorithm, Neural Computation, 6(2), 181-214]. However, it is reported in literature that the IRLS algorithm is of instability and the ME architecture trained by the EM algorithm, where IRLS algorithm is used in the inner loop, often produces the poor performance in multiclass classification. In this paper, the reason of this instability is explored. We find out that due to an implicitly imposed incorrect assumption on parameter independence in multiclass classification, an incomplete Hessian matrix is used in that IRLS algorithm. Based on this finding, we apply the Newton-Raphson method to the inner loop of the EM algorithm in the case of multiclass classification, where the exact Hessian matrix is adopted. To tackle the expensive computation of the Hessian matrix and its inverse, we propose an approximation to the Newton-Raphson algorithm based on a so-called generalized Bernoulli density. The Newton-Raphson algorithm and its approximation have been applied to synthetic data, benchmark, and real-world multiclass classification tasks, For comparison, the IRLS algorithm and a quasi-Newton algorithm called BFGS have also been applied to the same tasks. Simulation results have shown that the use of the proposed learning algorithms avoids the instability problem and makes the ME architecture produce good performance in multiclass classification. In particular, our approximation algorithm leads to fast learning. In addition, the limitation of our approximation algorithm is also empirically investigated in this paper. (C) 1999 Published by Elsevier Science Ltd. All rights reserved.

引用

页码：1229 / 1252

页数：24

共 42 条

[1] Input-output HMM's for sequence processing [J].

Bengio, Y ;

Frasconi, P .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (05) :1231-1249

[2]

Bishop C., 1991, International Journal of Neural Systems, V2, P229, DOI 10.1142/S0129065791000212

[3] EXACT CALCULATION OF THE HESSIAN MATRIX FOR THE MULTILAYER PERCEPTRON [J].

BISHOP, C .

NEURAL COMPUTATION, 1992, 4 (04) :494-501

[4]

BONING D, 1993, COMPUTATION STAT, P409

[5]

BREIMAN L, 1984, CLASSIFICAITON REGRE

[6]

Bridle J.S., 1990, NEUROCOMPUTING, P227, DOI DOI 10.1007/978-3-642-76153-9_28

[7]

Broyden C. G., 1970, Journal of the Institute of Mathematics and Its Applications, V6, P222

[8] Speaker recognition: A tutorial [J].

Campbell, JP .

PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462

[9]

Chen BY, 1998, BIOPROCESS ENG, V19, P7

[10] A modified HME architecture for text-dependent speaker identification [J].

Chen, K ;

Xie, DH ;

Chi, HS .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (05) :1309-1313

← 1 2 3 4 5 →