Implementing action mask in proximal policy optimization (PPO) algorithm

被引：33

作者：

Tang, Cheng-Yen ^{[1
]}

Liu, Chien-Hung ^{[1
]}

Chen, Woei-Kae ^{[1
]}

You, Shingchern D. ^{[1
]}

机构：

[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan

来源：

ICT EXPRESS | 2020年 / 6卷 / 03期

关键词：

PPO; Invalid action; Reinforcement learning;

D O I：

10.1016/j.icte.2020.05.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. (C) 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V.

引用

页码：200 / 203

页数：4

共 50 条

[31] Image captioning via proximal policy optimization [J].

Zhang, Le ;

Zhang, Yanshuo ;

Zhao, Xin ;

Zou, Zexiao .

IMAGE AND VISION COMPUTING, 2021, 108

[32] Improving proximal policy optimization with alpha divergence [J].

Xu, Haotian ;

Yan, Zheng ;

Xuan, Junyu ;

Zhang, Guangquan ;

Lu, Jie .

NEUROCOMPUTING, 2023, 534 :94-105

[33] Decaying Clipping Range in Proximal Policy Optimization [J].

Farsang, Monika ;

Szegletes, Luca .

IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2021), 2021, :521-525

[34] Optimizing parameters in swarm intelligence using reinforcement learning: An application of Proximal Policy Optimization to the iSOMA algorithm [J].

Klein, Lukas ;

Zelinka, Ivan ;

Seidl, David .

SWARM AND EVOLUTIONARY COMPUTATION, 2024, 85

[35] A novel guidance law based on proximal policy optimization [J].

Jiang, Yang ;

Yu, Jianglong ;

Li, Qingdong ;

Ren, Zhang ;

Done, Xiwang ;

Hua, Yongzhao .

2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, :3364-3369

[36] Proximal policy optimization with an integral compensator for quadrotor control [J].

Hu, Huan ;

Wang, Qing-ling .

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) :777-795

[37] Proximal policy optimization with an integral compensator for quadrotor control [J].

Huan Hu ;

Qing-ling Wang .

Frontiers of Information Technology & Electronic Engineering, 2020, 21 :777-795

[38] Proximal Policy Optimization with Elo-based Opponent Selection and Combination with Enhanced Rolling Horizon Evolution Algorithm [J].

Liang, Rongqin ;

Zhu, Yuanheng ;

Tang, Zhentao ;

Yang, Mu ;

Zhu, Xiaolong .

2021 IEEE CONFERENCE ON GAMES (COG), 2021, :1024-1027

[39] Vibration control of three coupled flexible beams using reinforcement learning algorithm based on proximal policy optimization [J].

Qiu, Zhi-cheng ;

Du, Jia-hao ;

Zhang, Xian-min .

JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 2022, 33 (20) :2578-2603

[40] Risk-Based Reserve Scheduling for Active Distribution Networks Based on an Improved Proximal Policy Optimization Algorithm [J].

Li, Xiaoyu ;

Han, Xueshan ;

Yang, Ming .

IEEE ACCESS, 2023, 11 :15211-15228

← 1 2 3 4 5 →