A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes

被引:0
|
作者
Berrar, Daniel [1 ,2 ]
Lopes, Philippe [3 ]
Dubitzky, Werner
机构
[1] Open Univ, Sch Math & Stat, Machine Learning Res Grp, Milton Keynes, England
[2] Tokyo Inst Technol, Sch Engn, Dept Informat & Commun Engn, Tokyo, Japan
[3] Univ Evry Paris Saclay, Sport & Exercise Sci Dept, Lab Biol Exercice Performance & Sante LBEPS, Evry Courcouronnes, France
关键词
2023 soccer prediction challenge; k-NN; Ordinal forests; Naive Bayes; Neural networks; Outcome prediction; Soccer analytics; Super league; ASSOCIATION FOOTBALL;
D O I
10.1007/s10994-024-06625-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score, i.e., the number of goals scored by each team. Task 2 was to predict the match outcome as probability vector over the three possible result categories: victory of the home team, draw, and victory of the away team. Here, we present a new data- and knowledge-driven framework for building machine learning models from readily available data to predict soccer match outcomes. A key component of this framework is an innovative approach to modeling interdependent time series data of competing entities. Using this framework, we developed various predictive models based on k-nearest neighbors, artificial neural networks, naive Bayes, and ordinal forests, which we applied to the two tasks of the 2023 Soccer Prediction Challenge. Among all submissions to the Challenge, our machine learning models based on k-nearest neighbors and neural networks achieved top performances. Our main insights from the Challenge are that relatively simple learning algorithms perform remarkably well compared to more complex algorithms, and that the key to successful predictions lies in how well soccer domain knowledge can be incorporated in the modeling process.
引用
收藏
页码:8165 / 8204
页数:40
相关论文
共 50 条
  • [31] Knowledge-driven machine learning based framework for early-stage disease risk prediction in edge environment
    Hossain M.A.
    Ferdousi R.
    Alhamid M.F.
    Journal of Parallel and Distributed Computing, 2020, 146 : 25 - 34
  • [32] Knowledge-driven Approach to Predict Personality Traits by Leveraging Social Media Data
    Thilakaratne, Menasha
    Weerasinghe, Ruyan
    Perera, Sujan
    2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), 2016, : 288 - 295
  • [33] Comparing a Data-Driven Versus Clinician-Curated Approach in Developing Machine-Learning Models to Predict Colorectal Cancer Surgery Outcomes
    Brauner, Karoline B.
    Mashkoor, Maliha
    Lin, Viviane
    Rosen, Andreas W.
    Gogenur, Mikail
    Justesen, Tobias F.
    Tsouchnika, Andi
    Gogenur, Ismail
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2024, 239 (05) : S108 - S109
  • [34] Knowledge-Driven and Data-Driven Fuzzy Models for Predictive Mineral Potential Mapping
    Alok Porwal
    E. J. M. Carranza
    M. Hale
    Natural Resources Research, 2003, 12 (1) : 1 - 25
  • [35] Data-and knowledge-driven belief rule learning for hybrid classification
    Geng, Xiaojiao
    Ma, Haonan
    Jiao, Lianmeng
    Zhou, Zhi-Jie
    INFORMATION SCIENCES, 2024, 681
  • [36] Attention mechanism-aided data- and knowledge-driven soft sensors for predicting blast furnace gas generation
    Liu, Shuhan
    Sun, Wenqiang
    ENERGY, 2023, 262
  • [37] Developing a New Framework to Explain Transverse Evolution of Knowledge-Driven Regional Policy Networks
    Barrutia, Jose M.
    Echebarria, Carmen
    INTERNATIONAL JOURNAL OF URBAN AND REGIONAL RESEARCH, 2010, 34 (04) : 906 - 924
  • [38] Data-driven prediction of soccer outcomes using enhanced machine and deep learning techniques
    Mills, Ebenezer Fiifi Emire Atta
    Deng, Zihui
    Zhong, Zhuoqing
    Li, Jinger
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [39] Machine Learning for Mapping and Forecasting Poverty in North Sumatera: A Data- Driven Approach
    Arnita
    Arpaung, Faridawaty m
    Amadhani, Fanny r
    Inata, Dewan
    SAINS MALAYSIANA, 2024, 53 (07): : 1715 - 1728
  • [40] Data- and interaction-driven approaches for sustained musical practices with machine learning
    Vigliensoni, Gabriel
    Fiebrink, Rebecca
    JOURNAL OF NEW MUSIC RESEARCH, 2025,