A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes

被引:0
|
作者
Berrar, Daniel [1 ,2 ]
Lopes, Philippe [3 ]
Dubitzky, Werner
机构
[1] Open Univ, Sch Math & Stat, Machine Learning Res Grp, Milton Keynes, England
[2] Tokyo Inst Technol, Sch Engn, Dept Informat & Commun Engn, Tokyo, Japan
[3] Univ Evry Paris Saclay, Sport & Exercise Sci Dept, Lab Biol Exercice Performance & Sante LBEPS, Evry Courcouronnes, France
关键词
2023 soccer prediction challenge; k-NN; Ordinal forests; Naive Bayes; Neural networks; Outcome prediction; Soccer analytics; Super league; ASSOCIATION FOOTBALL;
D O I
10.1007/s10994-024-06625-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score, i.e., the number of goals scored by each team. Task 2 was to predict the match outcome as probability vector over the three possible result categories: victory of the home team, draw, and victory of the away team. Here, we present a new data- and knowledge-driven framework for building machine learning models from readily available data to predict soccer match outcomes. A key component of this framework is an innovative approach to modeling interdependent time series data of competing entities. Using this framework, we developed various predictive models based on k-nearest neighbors, artificial neural networks, naive Bayes, and ordinal forests, which we applied to the two tasks of the 2023 Soccer Prediction Challenge. Among all submissions to the Challenge, our machine learning models based on k-nearest neighbors and neural networks achieved top performances. Our main insights from the Challenge are that relatively simple learning algorithms perform remarkably well compared to more complex algorithms, and that the key to successful predictions lies in how well soccer domain knowledge can be incorporated in the modeling process.
引用
收藏
页码:8165 / 8204
页数:40
相关论文
共 50 条
  • [21] Interaction of knowledge-driven and data-driven processing in category learning
    Vandierendonck, A
    Rosseel, Y
    EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2000, 12 (01): : 37 - 63
  • [22] Enhancing choice-set generation and route choice modeling with data- and knowledge-driven approach
    Liu, Dongjie
    Li, Dawei
    Gao, Kun
    Song, Yuchen
    Zhang, Tong
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 162
  • [23] Expectations, competencies and domain knowledge in data- and machine-driven finance
    Hansen, Kristian Bondo
    Souleles, Daniel
    ECONOMY AND SOCIETY, 2023, 52 (03) : 421 - 448
  • [24] A novel paradigm on data and knowledge-driven drug formulation development: Opportunities and challenges of machine learning
    Wang, Xinrui
    Liu, Zhenda
    Lin, Xiao
    Hong, Yanlong
    Shen, Lan
    Zhao, Lijie
    JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION, 2025, 44
  • [25] KDSL: a Knowledge-Driven Supervised Learning Framework for Word Sense Disambiguation
    Yin, Shi
    Zhou, Yi
    Li, Chenguang
    Wang, Shangfei
    Ji, Jianmin
    Chen, Xiaoping
    Wang, Ruili
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [26] Integrated Access to Big Data Polystores through a Knowledge-driven Framework
    McHugh, Justin
    Cuddihy, Paul E.
    Williams, Jenny Weisenberg
    Aggour, Kareem S.
    Kumar, Vijay S.
    Mulwad, Varish
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1494 - 1503
  • [27] A knowledge-driven agent-centred framework for data mining in EMG
    Balter, J
    Labarre-Vila, A
    Ziébelin, D
    Garbay, C
    COMPTES RENDUS BIOLOGIES, 2002, 325 (04) : 375 - 382
  • [28] Editorial: Bringing together data- and knowledge-driven solutions for a better understanding and effective diagnostics of neurological disorders
    Kaplun, Dmitrii
    Bogachev, Mikhail
    Singh, Pawan Kumar
    Sarkar, Ram
    FRONTIERS IN NEUROINFORMATICS, 2023, 17
  • [29] A Data- and Knowledge-Driven Method for Fusing Satellite-Derived and Ground-Based Precipitation Observations
    Chen, Fengrui
    Wang, Yiguo
    Li, Xi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
  • [30] Event-triggered data- and knowledge-driven adaptive quality iterative learning control with uncertainty for a pharmaceutical cyber-physical system
    Wang, Zhengsong
    Tang, Shengnan
    Guo, Ge
    Yang, Yanqiu
    Han, Meng
    Yang, Le
    He, Dakuo
    CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 2023, 101 (10): : 5844 - 5857