Implementing action mask in proximal policy optimization (PPO) algorithm

被引：33

作者：

Tang, Cheng-Yen ^{[1
]}

Liu, Chien-Hung ^{[1
]}

Chen, Woei-Kae ^{[1
]}

You, Shingchern D. ^{[1
]}

机构：

[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan

来源：

ICT EXPRESS | 2020年 / 6卷 / 03期

关键词：

PPO; Invalid action; Reinforcement learning;

D O I：

10.1016/j.icte.2020.05.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. (C) 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V.

引用

页码：200 / 203

页数：4

共 50 条

[1] PPO-CMA: PROXIMAL POLICY OPTIMIZATION WITH COVARIANCE MATRIX ADAPTATION
Hamalainen, Perttu
Babadi, Amin
Ma, Xiaoxiao
Lehtinen, Jaakko
PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
[2] Entropy adjustment by interpolation for exploration in Proximal Policy Optimization (PPO)
Boudlal, Ayoub
Khafaji, Abderahim
Elabbadi, Jamal
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[3] PPO-RM: Proximal Policy Optimization Based Route Mutation for Multimedia Services
Shen, Jiahao
Zhang, Tao
Zhang, Bingchi
Ji, Weixiao
Kuang, Xiaohui
Xu, Changqiao
IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 35 - 40
[4] GAA-PPO: A novel graph adversarial attack method by incorporating proximal policy optimization
Yang, Shuxin
Chang, Xiaoyang
Zhu, Guixiang
Cao, Jie
Qin, Weiping
Wang, Youquan
Wang, Zhendong
NEUROCOMPUTING, 2023, 557
[5] PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming
Naresh, Mandan
Saxena, Paresh
Gupta, Manik
2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 199 - 204
[6] A proximal policy optimization with curiosity algorithm for virtual drone navigation
Das, Rupayan
Khan, Angshuman
Paul, Gunjan
ENGINEERING RESEARCH EXPRESS, 2024, 6 (01):
[7] FPGA implementation of Proximal Policy Optimization algorithm for Edge devices with application to Agriculture Technology
Waseem S.M.
Roy S.K.
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (10) : 14141 - 14152
[8] PPO-TA: Adaptive task allocation via Proximal Policy Optimization for spatio-temporal crowdsourcing
Zhao, Bingxu
Dong, Hongbin
Wang, Yingjie
Pan, Tingwei
KNOWLEDGE-BASED SYSTEMS, 2023, 264
[9] Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm
Jia R.-D.
Ning W.-B.
He D.-K.
Chu F.
Wang F.-L.
Kongzhi yu Juece/Control and Decision, 2023, 38 (11): : 3075 - 3082
[10] Comparison of Empirical and Reinforcement Learning (RL)-Based Control Based on Proximal Policy Optimization (PPO) for Walking Assistance: Does AI Always Win?
Drewing, Nadine
Ahmadi, Arjang
Xiong, Xiaofeng
Sharbafi, Maziar Ahmad
BIOMIMETICS, 2024, 9 (11)

← 1 2 3 4 5 →