Deep Cascade Learning

被引：74

作者：

Marquez, Enrique S. ^{[1
]}

Hare, Jonathon S. ^{[1
]}

Niranjan, Mahesan ^{[1
]}

机构：

[1] Univ Southampton, Dept Elect & Comp Sci, Southampton SO17 1BJ, Hants, England

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2018年 / 29卷 / 11期

关键词：

Adaptive learning; cascade correlation; convolutional neural networks (CNNs); deep learning; image classification; NETWORK;

D O I：

10.1109/TNNLS.2018.2805098

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a novel approach for efficient training of deep neural networks in a bottom-up fashion using a layered structure. Our algorithm, which we refer to as deep cascade learning, is motivated by the cascade correlation approach of Fahlman and Lebiere, who introduced it in the context of perceptrons. We demonstrate our algorithm on networks of convolutional layers, though its applicability is more general. Such training of deep networks in a cascade directly circumvents the well-known vanishing gradient problem by ensuring that the output is always adjacent to the layer being trained. We present empirical evaluations comparing our deep cascade training with standard end-end training using back propagation of two convolutional neural network architectures on benchmark image classification tasks (CIFAR-10 and CIFAR-100). We then investigate the features learned by the approach and find that better, domain-specific, representations are learned in early layers when compared to what is learned in end-end training. This is partially attributable to the vanishing gradient problem that inhibits early layer filters to change significantly from their initial settings. While both networks perform similarly overall, recognition accuracy increases progressively with each added layer, with discriminative features learned in every stage of the network, whereas in end-end training, no such systematic feature representation was observed. We also show that such cascade training has significant computational and memory advantages over end-end training, and can be used as a pretraining algorithm to obtain a better performance.

引用

页码：5475 / 5485

页数：11

共 35 条

[1]

Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864

[2]

[Anonymous], 2014, 2014 IEEE C COMP VIS

[3]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[4]

[Anonymous], 2016, FRACTALNET ULTRADEEP

[5]

[Anonymous], 2009, Learning multiple layers of features from tiny images

[6]

[Anonymous], 2015, CORR

[7]

[Anonymous], 2016, XNOR NET IMAGENET CL

[8]

[Anonymous], SEQUENTIAL TRAINING

[9]

[Anonymous], 2015, PROC CVPR IEEE

[10]

[Anonymous], ADV NEURAL INF PROCE

← 1 2 3 4 →