Model-based reinforcement learning for alternating Markov games

被引:0
|
作者
Mellor, D [1 ]
机构
[1] Univ Newcastle, Sch Elect Engn & Comp Sci, Callaghan, NSW 2308, Australia
来源
AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2003年 / 2903卷
关键词
game playing; machine learning; planning; reinforcement learning; search;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online training is a promising technique for training reinforcement learning agents to play strategy board games over the internet against human opponents. But the limited training experience that can be generated by playing against real humans online means that learning must be data-efficient. Data-efficiency has been achieved in other domains by augmenting reinforcement learning with a model: model-based reinforcement learning. In this paper the Minimax-MBTD algorithm is presented, which extends model-based reinforcement learning to deterministic alternating Markov games, a generalisation of two-player zero-sum strategy board games like chess and Go. By using a minimax measure of optimality the strategy learnt generalises to arbitrary opponents, unlike approaches that explicitly model specific opponents. Minimax-MBTD is applied to Tic-Tac-Toe and found to converge faster than direct reinforcement learning, but focussing planning on successors to the current state resulted in slower convergence than unfocussed random planning.
引用
收藏
页码:520 / 531
页数:12
相关论文
共 50 条
  • [1] A survey on model-based reinforcement learning
    Luo, Fan-Ming
    Xu, Tian
    Lai, Hang
    Chen, Xiong-Hui
    Zhang, Weinan
    Yu, Yang
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
  • [2] Model-based average reward reinforcement learning
    Tadepalli, P
    Ok, D
    ARTIFICIAL INTELLIGENCE, 1998, 100 (1-2) : 177 - 224
  • [3] Efficient hyperparameter optimization through model-based reinforcement learning
    Wu, Jia
    Chen, SenPeng
    Liu, XiYuan
    NEUROCOMPUTING, 2020, 409 : 381 - 393
  • [4] High-accuracy model-based reinforcement learning, a survey
    Plaat, Aske
    Kosters, Walter
    Preuss, Mike
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9541 - 9573
  • [5] High-accuracy model-based reinforcement learning, a survey
    Aske Plaat
    Walter Kosters
    Mike Preuss
    Artificial Intelligence Review, 2023, 56 : 9541 - 9573
  • [6] Model-Based Reinforcement Learning in Robotics: A Survey
    Sun S.
    Lan X.
    Zhang H.
    Zheng N.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (01): : 1 - 16
  • [7] Asynchronous Methods for Model-Based Reinforcement Learning
    Zhang, Yunzhi
    Clavera, Ignasi
    Tsai, Boren
    Abbeel, Pieter
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [8] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [9] Model gradient: unified model and policy learning in model-based reinforcement learning
    Chengxing Jia
    Fuxiang Zhang
    Tian Xu
    Jing-Cheng Pang
    Zongzhang Zhang
    Yang Yu
    Frontiers of Computer Science, 2024, 18
  • [10] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
    Sun, Yuewen
    Zhang, Kun
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048