Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series

被引:357
作者
Dal Molin Ribeiro, Matheus Henrique [1 ,2 ]
Coelho, Leandro dos Santos [1 ,3 ]
机构
[1] Pontifical Catholic Univ Parana PUCPR, Grad Program Ind & Syst Engn PPGEPS, 1155 Rua Imaculada Conceicao, BR-80215901 Curitiba, Parana, Brazil
[2] Fed Technol Univ Parana UTFPR, Dept Math, Via Conhecimento,Km 01 Fraron, BR-85503390 Pato Branco, Parana, Brazil
[3] Fed Univ Parana UFPR, Dept Elect Engn, 100 Ave Cel Francisco Heraclito Santos, BR-81530000 Curitiba, Parana, Brazil
关键词
Bagging; Boosting; Agricultural commodity; Ensemble regression; Stacking; Time series; SUPPORT VECTOR REGRESSION; MACHINE LEARNING TECHNIQUES; GOLD-PRICE FLUCTUATIONS; MODELS; METHODOLOGY; CONSUMPTION; VOLATILITY; ALGORITHMS; PARAMETERS; TREES;
D O I
10.1016/j.asoc.2019.105837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The investigation of the accuracy of methods employed to forecast agricultural commodities prices is an important area of study. In this context, the development of effective models is necessary. Regression ensembles can be used for this purpose. An ensemble is a set of combined models which act together to forecast a response variable with lower error. Faced with this, the general contribution of this work is to explore the predictive capability of regression ensembles by comparing ensembles among themselves, as well as with approaches that consider a single model (reference models) in the agribusiness area to forecast prices one month ahead. In this aspect, monthly time series referring to the price paid to producers in the state of Parana, Brazil for a 60 kg bag of soybean (case study 1) and wheat (case study 2) are used. The ensembles bagging (random forests - RF), boosting (gradient boosting machine - GBM and extreme gradient boosting machine - XGB), and stacking (STACK) are adopted. The support vector machine for regression (SVR), multilayer perceptron neural network (MLP) and K-nearest neighbors (KNN) are adopted as reference models. Performance measures such as mean absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE) are used for models comparison. Friedman and Wilcoxon signed rank tests are applied to evaluate the models' absolute percentage errors (APE). From the comparison of test set results, MAPE lower than 1% is observed for the best ensemble approaches. In this context, the XGB/STACK (Least Absolute Shrinkage and Selection Operator-KNN-XGB-SVR) and RF models showed better performance for short-term forecasting tasks for case studies 1 and 2, respectively. Better APE (statistically smaller) is observed for XGB/STACK and RF in relation to reference models. Besides that, approaches based on boosting are consistent, providing good results in both case studies. Alongside, a rank according to the performances is: XGB, GBM, RF, STACK, MLP, SVR and KNN. It can be concluded that the ensemble approach presents statistically significant gains, reducing prediction errors for the price series studied. The use of ensembles is recommended to forecast agricultural commodities prices one month ahead, since a more assertive performance is observed, which allows to increase the accuracy of the constructed model and reduce decision-making risk. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 94 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
Allende H, 2017, STUD FUZZ SOFT COMP, V349, P217, DOI 10.1007/978-3-319-48317-7_13
[3]  
Alves LF, 2015, ETHNOBIOL CONSERV, V4
[4]   Investigating the effect of training-testing data stratification on the performance of soft computing techniques: an experimental study [J].
Anifowose, Fatai ;
Khoukhi, Amar ;
Abdulraheem, Abdulazeez .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (03) :517-535
[5]   Improving the prediction of petroleum reservoir characterization with a stacked generalization ensemble model of support vector machines [J].
Anifowose, Fatai ;
Labadin, Jane ;
Abdulraheem, Abdulazeez .
APPLIED SOFT COMPUTING, 2015, 26 :483-496
[6]  
[Anonymous], HDB FINANCE FINANCIA
[7]  
[Anonymous], FUTURE STUD RES J TR
[8]  
[Anonymous], ANALISE SERIES TEMPO
[9]  
[Anonymous], PARALLEL DISTRIBUTED
[10]  
[Anonymous], REV ESPAC