PGD: A Large-scale Professional Go Dataset for Data-driven Analytics

被引:2
|
作者
Gao, Yifan [1 ]
机构
[1] Univ Sci & Technol China, Sch Biomed Engn, Div Life Sci & Med, Hefei, Peoples R China
关键词
Go; game analytics; data mining; board game; CHESS; PLAYERS;
D O I
10.1109/CoG51982.2022.9893704
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Lee Sedol is on a winning streak-does this legend rise again after the competition with AlphaGo? Ke Jie is invincible in the world championship-can he still win the title this time? Go is one of the most popular board games in East Asia, with a stable professional sports system that has lasted for decades in China, Japan, and Korea. There are mature data-driven analysis technologies for many sports, such as soccer, basketball, and esports. However, developing such technology for Go remains nontrivial and challenging due to the lack of datasets, meta-information, and in-game statistics. This paper creates the Professional Go Dataset (PGD), containing 98,043 games played by 2,148 professional players from 1950 to 2021. After manual cleaning and labeling, we provide detailed meta-information for each player, game, and tournament. Moreover, the dataset includes analysis results for each move in the match evaluated by advanced AlphaZero-based AI. To establish a benchmark for PGD, we further analyze the data and extract meaningful in-game features based on prior knowledge related to Go that can indicate the game status. With the help of complete meta-information and constructed in-game features, our results prediction system achieves an accuracy of 75.30%, much higher than several state-of-the-art approaches (64%-65%). As far as we know, PGD is the first dataset for data-driven analytics in Go and even in board games. Beyond this promising result, we provide more examples of tasks that benefit from our dataset. The ultimate goal of this paper is to bridge this ancient game and the modern data science community. It will advance research on Go-related analytics to enhance the fan experience, help players improve their ability, and facilitate other promising aspects. The dataset will be made publicly available.
引用
收藏
页码:284 / 291
页数:8
相关论文
共 50 条
  • [1] PGD: A Large-scale Professional Go Dataset for Data-driven Analytics
    Gao, Yifan
    arXiv, 2022,
  • [2] Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset
    Zhang, Cong
    Kang, Kai
    Li, Hongsheng
    Wang, Xiaogang
    Xie, Rong
    Yang, Xiaokang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1048 - 1061
  • [3] mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
    Mirarchi, Antonio
    Giorgino, Toni
    De Fabritiis, Gianni
    SCIENTIFIC DATA, 2024, 11 (01)
  • [4] The Piraeus AIS dataset for large-scale maritime data analytics
    Tritsarolis, Andreas
    Kontoulis, Yannis
    Theodoridis, Yannis
    DATA IN BRIEF, 2022, 40
  • [5] WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting
    Demir, Ibrahim
    Xiang, Zhongrun
    Demiray, Bekir
    Sit, Muhammed
    EARTH SYSTEM SCIENCE DATA, 2022, 14 (12) : 5605 - 5616
  • [6] A Data-driven Mechanism for Large-scale Data Distribution
    Shi Peichang
    Li Yiying
    Ding Bo
    Jiang Longquan
    Liu Hui
    Zhang Jie
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [7] Data-driven Authoring of Large-scale Ecosystems
    Kapp, Konrad
    Gain, James
    Guerin, Eric
    Galin, Eric
    Peytavie, Adrien
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [8] Large-scale Data-driven Segmentation of Banking Customers
    Hossain, Md Monir
    Sebestyen, Mark
    Mayank, Dhruv
    Ardakanian, Omid
    Khazaei, Hamzeh
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4392 - 4401
  • [9] Data-driven realistic animation of large-scale forest
    School of Computer Science, Wuhan University, Wuhan 430079, China
    不详
    不详
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2008, 20 (08): : 1015 - 1022
  • [10] Large-scale mode identification and data-driven sciences
    Mukhopadhyay, Subhadeep
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (01): : 215 - 240