Model-based reinforcement learning for alternating Markov games

被引：0

作者：

Mellor, D ^{[1
]}

机构：

[1] Univ Newcastle, Sch Elect Engn & Comp Sci, Callaghan, NSW 2308, Australia

来源：

AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2003年 / 2903卷

关键词：

game playing; machine learning; planning; reinforcement learning; search;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Online training is a promising technique for training reinforcement learning agents to play strategy board games over the internet against human opponents. But the limited training experience that can be generated by playing against real humans online means that learning must be data-efficient. Data-efficiency has been achieved in other domains by augmenting reinforcement learning with a model: model-based reinforcement learning. In this paper the Minimax-MBTD algorithm is presented, which extends model-based reinforcement learning to deterministic alternating Markov games, a generalisation of two-player zero-sum strategy board games like chess and Go. By using a minimax measure of optimality the strategy learnt generalises to arbitrary opponents, unlike approaches that explicitly model specific opponents. Minimax-MBTD is applied to Tic-Tac-Toe and found to converge faster than direct reinforcement learning, but focussing planning on successors to the current state resulted in slower convergence than unfocussed random planning.

引用

页码：520 / 531

页数：12

共 50 条

[1] A survey on model-based reinforcement learning
Luo, Fan-Ming
Xu, Tian
Lai, Hang
Chen, Xiong-Hui
Zhang, Weinan
Yu, Yang
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
[2] Model-based average reward reinforcement learning
Tadepalli, P
Ok, D
ARTIFICIAL INTELLIGENCE, 1998, 100 (1-2) : 177 - 224
[3] Efficient hyperparameter optimization through model-based reinforcement learning
Wu, Jia
Chen, SenPeng
Liu, XiYuan
NEUROCOMPUTING, 2020, 409 : 381 - 393
[4] High-accuracy model-based reinforcement learning, a survey
Plaat, Aske
Kosters, Walter
Preuss, Mike
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9541 - 9573
[5] High-accuracy model-based reinforcement learning, a survey
Aske Plaat
Walter Kosters
Mike Preuss
Artificial Intelligence Review, 2023, 56 : 9541 - 9573
[6] Model-Based Reinforcement Learning in Robotics: A Survey
Sun S.
Lan X.
Zhang H.
Zheng N.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (01): : 1 - 16
[7] Asynchronous Methods for Model-Based Reinforcement Learning
Zhang, Yunzhi
Clavera, Ignasi
Tsai, Boren
Abbeel, Pieter
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[8] Model gradient: unified model and policy learning in model-based reinforcement learning
Jia, Chengxing
Zhang, Fuxiang
Xu, Tian
Pang, Jing-Cheng
Zhang, Zongzhang
Yu, Yang
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
[9] Model gradient: unified model and policy learning in model-based reinforcement learning
Chengxing Jia
Fuxiang Zhang
Tian Xu
Jing-Cheng Pang
Zongzhang Zhang
Yang Yu
Frontiers of Computer Science, 2024, 18
[10] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
Sun, Yuewen
Zhang, Kun
Sun, Changyin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048

← 1 2 3 4 5 →