Extreme Learning Machine for Multilayer Perceptron

被引:1120
作者
Tang, Jiexiong [1 ]
Deng, Chenwei [1 ]
Huang, Guang-Bin [2 ]
机构
[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Deep learning (DL); deep neural network (DNN); extreme learning machine (ELM); multilayer perceptron (MLP); random feature mapping; RECOGNITION; ALGORITHM;
D O I
10.1109/TNNLS.2015.2424995
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via l1 constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.
引用
收藏
页码:809 / 821
页数:13
相关论文
共 33 条
[1]   Learning to detect objects in images via a sparse, part-based representation [J].
Agarwal, S ;
Awan, A ;
Roth, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (11) :1475-1490
[2]  
Agarwal S, 2002, LECT NOTES COMPUT SC, V2353, P113
[3]  
[Anonymous], 2009, Deep boltzmann machines
[4]  
[Anonymous], 2004, P WORKSH STAT LEARN
[5]   Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems [J].
Beck, Amir ;
Teboulle, Marc .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2009, 18 (11) :2419-2434
[6]   A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].
Beck, Amir ;
Teboulle, Marc .
SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202
[7]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[8]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[9]  
Bishop C., 2006, Pattern recognition and machine learning, P423
[10]  
Fergus R, 2003, PROC CVPR IEEE, P264