Sequential Training of Neural Networks With Gradient Boosting

被引：7

作者：

Emami, Seyedsaman ^{[1
]}

Martinez-Munoz, Gonzalo ^{[1
]}

机构：

[1] Univ Autonoma Madrid, Escuela Politecn Super, Madrid 28049, Spain

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Gradient boosting; neural network; CLASSIFIERS;

D O I：

10.1109/ACCESS.2023.3271515

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel technique based on gradient boosting to train the final layers of a neural network (NN). Gradient boosting is an additive expansion algorithm in which a series of models are trained sequentially to approximate a given function. A neural network can also be seen as an additive expansion where the scalar product of the responses of the last hidden layer and its weights provide the final output of the network. Instead of training the network as a whole, the proposed algorithm trains the network sequentially in T steps. First, the bias term of the network is initialized with a constant approximation that minimizes the average loss of the data. Then, at each step, a portion of the network, composed of J neurons, is trained to approximate the pseudo-residuals on the training data computed from the previous iterations. Finally, the T partial models and bias are integrated as a single NN with T x J neurons in the hidden layer. Extensive experiments in classification and regression tasks, as well as in combination with deep neural networks, are carried out showing a competitive generalization performance with respect to neural networks trained with different standard solvers, such as Adam, L-BFGS, SGD and deep models. Furthermore, we show that the proposed method design permits to switch off a number of hidden units during test (the units that were last trained) without a significant reduction of its generalization ability. This permits the adaptation of the model to different classification speed requirements on the fly.

引用

页码：42738 / 42750

页数：13

共 44 条

[1] [Anonymous], 1996, Bias, variance, and arcing classifiers
[2] Bengio Y., 2005, P ADV NEUR INF PROC, V18, P1
[3] A comparative analysis of gradient boosting algorithms
Bentejac, Candice
Csorgo, Anna
Martinez-Munoz, Gonzalo
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1937 - 1967
[4] Neural Random Forests
Biau, Gerard
Scornet, Erwan
Welbl, Johannes
[J]. SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY, 2019, 81 (02): : 347 - 386
[5] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[6] Caruana R., 2006, P 23 INT C MACH LEAR, P161, DOI [10.1145/1143844.1143865, DOI 10.1145/1143844.1143865]
[7] Chen JH, 2020, Arxiv, DOI arXiv:1806.06763
[8] XGBoost: A Scalable Tree Boosting System
Chen, Tianqi
Guestrin, Carlos
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
[9] Chollet F., 2015, About us
[10] Defazio A, 2021, Arxiv, DOI arXiv:2101.11075

← 1 2 3 4 5 →