A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes

被引:0
|
作者
Berrar, Daniel [1 ,2 ]
Lopes, Philippe [3 ]
Dubitzky, Werner
机构
[1] Open Univ, Sch Math & Stat, Machine Learning Res Grp, Milton Keynes, England
[2] Tokyo Inst Technol, Sch Engn, Dept Informat & Commun Engn, Tokyo, Japan
[3] Univ Evry Paris Saclay, Sport & Exercise Sci Dept, Lab Biol Exercice Performance & Sante LBEPS, Evry Courcouronnes, France
关键词
2023 soccer prediction challenge; k-NN; Ordinal forests; Naive Bayes; Neural networks; Outcome prediction; Soccer analytics; Super league; ASSOCIATION FOOTBALL;
D O I
10.1007/s10994-024-06625-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score, i.e., the number of goals scored by each team. Task 2 was to predict the match outcome as probability vector over the three possible result categories: victory of the home team, draw, and victory of the away team. Here, we present a new data- and knowledge-driven framework for building machine learning models from readily available data to predict soccer match outcomes. A key component of this framework is an innovative approach to modeling interdependent time series data of competing entities. Using this framework, we developed various predictive models based on k-nearest neighbors, artificial neural networks, naive Bayes, and ordinal forests, which we applied to the two tasks of the 2023 Soccer Prediction Challenge. Among all submissions to the Challenge, our machine learning models based on k-nearest neighbors and neural networks achieved top performances. Our main insights from the Challenge are that relatively simple learning algorithms perform remarkably well compared to more complex algorithms, and that the key to successful predictions lies in how well soccer domain knowledge can be incorporated in the modeling process.
引用
收藏
页码:8165 / 8204
页数:40
相关论文
共 50 条
  • [1] A data- and knowledge-driven framework for digital twin manufacturing cell
    Zhang, Chao
    Zhou, Guanghui
    He, Jun
    Li, Zhi
    Cheng, Wei
    11TH CIRP CONFERENCE ON INDUSTRIAL PRODUCT-SERVICE SYSTEMS, 2019, 83 : 345 - 350
  • [2] CLEP: a hybrid data- and knowledge-driven framework for generating patient representations
    Bharadhwaj, Vinay Srinivas
    Ali, Mehdi
    Birkenbihl, Colin
    Mubeen, Sarah
    Lehmann, Jens
    Hofmann-Apitius, Martin
    Hoyt, Charles Tapley
    Domingo-Fernandez, Daniel
    BIOINFORMATICS, 2021, 37 (19) : 3311 - 3318
  • [3] Combining Data- and Knowledge-Driven AI with Didactics for Individualized Learning Recommendations
    Landes, Dieter
    Sedelmaier, Yvonne
    Boeck, Felix
    Lehmann, Alexander
    Fraas, Melanie
    Janusch, Sebastian
    2024 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE, EDUCON 2024, 2024,
  • [4] Spatial modelling of disease using data- and knowledge-driven approaches
    Stevens, Kim B.
    Pfeiffer, Dirk U.
    SPATIAL AND SPATIO-TEMPORAL EPIDEMIOLOGY, 2011, 2 (03) : 125 - 133
  • [5] Exploring data- and knowledge-driven methods for adaptive activity learning with dynamically available contexts
    Wen, Jiahui
    Indulska, Jadwiga
    Zhong, Mingyang
    Cheng, Xiaohui
    Ma, Jingwei
    CCF TRANSACTIONS ON PERVASIVE COMPUTING AND INTERACTION, 2019, 1 (01) : 24 - 46
  • [6] Exploring data- and knowledge-driven methods for adaptive activity learning with dynamically available contexts
    Jiahui Wen
    Jadwiga Indulska
    Mingyang Zhong
    Xiaohui Cheng
    Jingwei Ma
    CCF Transactions on Pervasive Computing and Interaction, 2019, 1 : 24 - 46
  • [7] Can Machine Learning Predict Soccer Match Results?
    Capobianco, Giovanni
    Di Giacomo, Umberto
    Mercaldo, Francesco
    Nardone, Vittoria
    Santone, Antonella
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 458 - 465
  • [8] Data- and knowledge-driven mineral prospectivity maps for Canada's North
    Harris, J. R.
    Grunsky, E.
    Behnia, P.
    Corrigan, D.
    ORE GEOLOGY REVIEWS, 2015, 71 : 788 - 803
  • [9] A systematic comparison of data- and knowledge-driven approaches to disease subtype discovery
    Rintala, Teemu J.
    Federico, Antonio
    Latonen, Leena
    Greco, Dario
    Fortino, Vittorio
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [10] Decision support based on genomics: integration of data- and knowledge-driven reasoning
    Sfakianakis, S.
    Blazantonakis, M.
    Dimou, I.
    Zervakis, M.
    Tsiknakis, M.
    Potamias, G.
    Kafetzopoulos, D.
    Lowe, D.
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2010, 3 (3-4) : 287 - 307