Implementing action mask in proximal policy optimization (PPO) algorithm

被引：42

作者：

Tang, Cheng-Yen ^{[1
]}

Liu, Chien-Hung ^{[1
]}

Chen, Woei-Kae ^{[1
]}

You, Shingchern D. ^{[1
]}

机构：

[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan

来源：

ICT EXPRESS | 2020年 / 6卷 / 03期

关键词：

PPO; Invalid action; Reinforcement learning;

D O I：

10.1016/j.icte.2020.05.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. (C) 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V.

引用

页码：200 / 203

页数：4

共 50 条

[41] Decaying Clipping Range in Proximal Policy Optimization [J].

Farsang, Monika ;

Szegletes, Luca .

IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, :521-525

[42] Multihead Discrete Action Calibration Proximal Policy Optimization Method for Pixel Antennas With High Degrees of Freedom [J].

Chen, Haibiao ;

Li, Shiyuan ;

Wu, Zeming ;

Wang, Lixiao ;

Li, Xiao-Chun ;

Liu, Qing Huo .

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2025, 73 (05) :2881-2894

[43] A Hybrid Genetic Algorithm and Proximal Policy Optimization System for Efficient Multi-Agent Task Allocation [J].

Zhu, Zimo ;

Yu, Chuanqiang ;

Wang, Junti .

SYSTEMS, 2025, 13 (06)

[44] Optimizing parameters in swarm intelligence using reinforcement learning: An application of Proximal Policy Optimization to the iSOMA algorithm [J].

Klein, Lukas ;

Zelinka, Ivan ;

Seidl, David .

SWARM AND EVOLUTIONARY COMPUTATION, 2024, 85

[45] Autonomous Heterogeneous Mining Fleet Control at Nonorthogonal Intersections by Hippopotamus Optimization Algorithm-Based Adaptive Proximal Policy Optimization [J].

Wang, Zhichao ;

Yang, Jue .

TRANSPORTATION RESEARCH RECORD, 2025, 2679 (08) :280-297

[46] A novel guidance law based on proximal policy optimization [J].

Jiang, Yang ;

Yu, Jianglong ;

Li, Qingdong ;

Ren, Zhang ;

Done, Xiwang ;

Hua, Yongzhao .

2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, :3364-3369

[47] Proximal policy optimization with an integral compensator for quadrotor control [J].

Hu, Huan ;

Wang, Qing-ling .

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) :777-795

[48] Proximal policy optimization with an integral compensator for quadrotor control [J].

Huan Hu ;

Qing-ling Wang .

Frontiers of Information Technology & Electronic Engineering, 2020, 21 :777-795

[49] Proximal policy optimization with reward-based prioritization [J].

Zheng, Mingsheng ;

Zhang, Junwei ;

Zhan, Changshuai ;

Ren, Xinyu ;

Lu, Shuai .

EXPERT SYSTEMS WITH APPLICATIONS, 2025, 283

[50] Proximal Policy Optimization with Elo-based Opponent Selection and Combination with Enhanced Rolling Horizon Evolution Algorithm [J].

Liang, Rongqin ;

Zhu, Yuanheng ;

Tang, Zhentao ;

Yang, Mu ;

Zhu, Xiaolong .

2021 IEEE CONFERENCE ON GAMES (COG), 2021, :1024-1027

← 1 2 3 4 5 →