Predicting matches in international football tournaments with random forests

被引:17
作者
Schauberger, Gunther [1 ,2 ]
Groll, Andreas [3 ]
机构
[1] Tech Univ Munich, Dept Sport & Hlth Sci, Chair Epidemiol, Georg Brauchle Ring 56, D-80992 Munich, Germany
[2] Ludwig Maximilians Univ Munchen, Dept Stat, Munich, Germany
[3] Tech Univ Dortmund, Fac Stat, Dortmund, Germany
关键词
random forests; football; FIFA World Cups; Poisson regression; regularization; POISSON MODEL; REGRESSION; SELECTION; REGULARIZATION; SCORES;
D O I
10.1177/1471082X18799934
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many approaches that analyse and predict results of international matches in football are based on statistical models incorporating several potentially influential covariates with respect to a national team's success, such as the bookmakers' ratings or the FIFA ranking. Based on all matches from the four previous FIFA World Cups 2002-2014, we compare the most common regression models that are based on the teams' covariate information with regard to their predictive performances with an alternative modelling class, the so-called random forests. Random forests can be seen as a mixture between machine learning and statistical modelling and are known for their high predictive power. Here, we consider two different types of random forests depending on the choice of response. One type of random forests predicts the precise numbers of goals, while the other type considers the three match outcomes-win, draw and loss-using special algorithms for ordinal responses. To account for the specific data structure of football matches, in particular at FIFA World Cups, the random forest methods are slightly altered compared to their standard versions and adapted to the specific needs of the application to FIFA World Cup data.
引用
收藏
页码:460 / 482
页数:23
相关论文
共 30 条
[1]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Boosting algorithms: Regularization, prediction and model fitting [J].
Buehlmann, Peter ;
Hothorn, Torsten .
STATISTICAL SCIENCE, 2007, 22 (04) :477-505
[4]   Modelling association football scores and inefficiencies in the football betting market [J].
Dixon, MJ ;
Coles, SG .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1997, 46 (02) :265-280
[5]  
Dyte D, 2000, J OPER RES SOC, V51, P993, DOI 10.2307/254054
[6]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[7]   Strictly proper scoring rules, prediction, and estimation [J].
Gneiting, Tilmann ;
Raftery, Adrian E. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) :359-378
[8]   On the dependency of soccer scores - a sparse bivariate Poisson model for the UEFA European football championship 2016 [J].
Groll, Andreas ;
Kneib, Thomas ;
Mayr, Andreas ;
Schauberger, Gunther .
JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 2018, 14 (02) :65-79
[9]   Spain retains its title and sets a new record generalized linear mixed models on European football championships [J].
Groll, Andreas ;
Abedieh, Jasmin .
JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 2013, 9 (01) :51-66
[10]   Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: An application to the FIFA World Cup 2014 [J].
Groll, Andreas ;
Schauberger, Gunther ;
Tutz, Gerhard .
JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 2015, 11 (02) :97-115