Kaggle forecasting competitions: An overlooked learning opportunity

被引:136
作者
Bojer, Casper Solheim [1 ]
Meldgaard, Jens Peder [1 ]
机构
[1] Aalborg Univ, Dept Mat & Prod, Fibigerstr 16, DK-9220 Aalborg, Denmark
关键词
Time series methods; M competitions; Business forecasting; Forecast accuracy; Machine learning methods; Benchmarking Time series visualization; Forecasting competition review;
D O I
10.1016/j.ijforecast.2020.07.007
中图分类号
F [经济];
学科分类号
02 ;
摘要
We review the results of six forecasting competitions based on the online data science platform Kaggle, which have been largely overlooked by the forecasting community. In contrast to the M competitions, the competitions reviewed in this study feature daily and weekly time series with exogenous variables, business hierarchy information, or both. Furthermore, the Kaggle data sets all exhibit higher entropy than the M3 and M4 competitions, and they are intermittent. In this review, we confirm the conclusion of the M4 competition that ensemble models using cross-learning tend to outperform local time series models and that gradient boosted decision trees and neural networks are strong forecast methods. Moreover, we present insights regarding the use of external information and validation strategies, and discuss the impacts of data characteristics on the choice of statistics or machine learning methods. Based on these insights, we construct nine ex-ante hypotheses for the outcome of the M5 competition to allow empirical validation of our findings. (C) 2020 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:587 / 603
页数:17
相关论文
共 40 条
[1]   Long short-term memory [J].
Hochreiter, S ;
Schmidhuber, J .
NEURAL COMPUTATION, 1997, 9 (08) :1735-1780
[2]   The tourism forecasting competition [J].
Athanasopoulos, George ;
Hyndman, Rob J. ;
Song, Haiyan ;
Wu, Doris C. .
INTERNATIONAL JOURNAL OF FORECASTING, 2011, 27 (03) :822-844
[3]   The value of feedback in forecasting competitions [J].
Athanasopoulos, George ;
Hyndman, Rob J. .
INTERNATIONAL JOURNAL OF FORECASTING, 2011, 27 (03) :845-849
[4]   COMBINATION OF FORECASTS [J].
BATES, JM ;
GRANGER, CWJ .
OPERATIONAL RESEARCH QUARTERLY, 1969, 20 (04) :451-&
[5]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[6]   Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction [J].
Crone, Sven F. ;
Hibon, Michele ;
Nikolopoulos, Konstantinos .
INTERNATIONAL JOURNAL OF FORECASTING, 2011, 27 (03) :635-660
[7]   Forecasting the M4 competition weekly data: Forecast Pro's winning approach [J].
Darin, Sarah Goodrich ;
Stellwagen, Eric .
INTERNATIONAL JOURNAL OF FORECASTING, 2020, 36 (01) :135-141
[8]  
Dowle M., 2019, Data.Table: Extension of Data.Frame, R Package Version 1.12. 8. Manual
[9]  
Duan T., 2019, NGBoost: Natural Gradient Boosting for Probabilistic Prediction
[10]  
Fildes R, 2004, A companion to economic forecasting, P322, DOI [10.1002/9780470996430.ch15, DOI 10.1002/9780470996430.CH15]