Predicting matches in international football tournaments with random forests

被引:17
作者
Schauberger, Gunther [1 ,2 ]
Groll, Andreas [3 ]
机构
[1] Tech Univ Munich, Dept Sport & Hlth Sci, Chair Epidemiol, Georg Brauchle Ring 56, D-80992 Munich, Germany
[2] Ludwig Maximilians Univ Munchen, Dept Stat, Munich, Germany
[3] Tech Univ Dortmund, Fac Stat, Dortmund, Germany
关键词
random forests; football; FIFA World Cups; Poisson regression; regularization; POISSON MODEL; REGRESSION; SELECTION; REGULARIZATION; SCORES;
D O I
10.1177/1471082X18799934
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many approaches that analyse and predict results of international matches in football are based on statistical models incorporating several potentially influential covariates with respect to a national team's success, such as the bookmakers' ratings or the FIFA ranking. Based on all matches from the four previous FIFA World Cups 2002-2014, we compare the most common regression models that are based on the teams' covariate information with regard to their predictive performances with an alternative modelling class, the so-called random forests. Random forests can be seen as a mixture between machine learning and statistical modelling and are known for their high predictive power. Here, we consider two different types of random forests depending on the choice of response. One type of random forests predicts the precise numbers of goals, while the other type considers the three match outcomes-win, draw and loss-using special algorithms for ordinal responses. To account for the specific data structure of football matches, in particular at FIFA World Cups, the random forest methods are slightly altered compared to their standard versions and adapted to the specific needs of the application to FIFA World Cup data.
引用
收藏
页码:460 / 482
页数:23
相关论文
共 30 条
[11]  
Ho TK, 1998, IEEE T PATTERN ANAL, V20, P832, DOI 10.1109/34.709601
[12]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&
[13]   gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework [J].
Hofner, Benjamin ;
Mayr, Andreas ;
Schmid, Matthias .
JOURNAL OF STATISTICAL SOFTWARE, 2016, 74 (01) :1-31
[14]  
Hornung R, 2017, 212 DEP STAT LMU MUN
[15]   Bagging survival tree [J].
Hothorn, T ;
Lausen, B ;
Benner, A ;
Radespiel-Tröger, M .
STATISTICS IN MEDICINE, 2004, 23 (01) :77-91
[16]   Unbiased recursive partitioning: A conditional inference framework [J].
Hothorn, Torsten ;
Hornik, Kurt ;
Zeileis, Achim .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2006, 15 (03) :651-674
[17]   Analysis of sports data by using bivariate Poisson models [J].
Karlis, D ;
Ntzoufras, L .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 2003, 52 :381-393
[18]  
Lee A.J., 1997, Chance, V10, P15, DOI [https://doi.org/10.1080/09332480.1997.10554791, DOI 10.1080/09332480.1997.10554791]
[19]   Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008 [J].
Leitner, Christoph ;
Zeileis, Achim ;
Hornik, Kurt .
INTERNATIONAL JOURNAL OF FORECASTING, 2010, 26 (03) :471-481
[20]  
Maher M.J., 1982, Stat. Neerl, V36, P109, DOI [DOI 10.1111/J.1467-9574.1982.TB00782.X, 10.1111/j.1467-9574.1982.tb00782.x, https://doi.org/10.1111/j.1467-9574.1982.tb00782.x]