Implementing action mask in proximal policy optimization (PPO) algorithm

被引：42

作者：

Tang, Cheng-Yen ^{[1
]}

Liu, Chien-Hung ^{[1
]}

Chen, Woei-Kae ^{[1
]}

You, Shingchern D. ^{[1
]}

机构：

[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan

来源：

ICT EXPRESS | 2020年 / 6卷 / 03期

关键词：

PPO; Invalid action; Reinforcement learning;

D O I：

10.1016/j.icte.2020.05.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. (C) 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V.

引用

页码：200 / 203

页数：4

共 50 条

[21] Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making [J].

Yang, Yiming ;

Xing, Dengpeng ;

Xia, Wannian ;

Wang, Peng .

MACHINE INTELLIGENCE RESEARCH, 2025, :797-816

[22] Enhancing Cybersecurity: A Proximal Policy Optimization Approach for Security Policy Optimization [J].

Yang, Jiuling ;

Shi, Jiayi ;

Kuang, Ping ;

Feng, Zhikun ;

Xiong, Kun ;

Shi, Yuan .

PROCEEDINGS OF 2024 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE, CSAI 2024, 2024, :614-620

[23] An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm [J].

Kim, Hye-Young .

JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (02) :297-305

[24] AGV path planning and task scheduling based on improved proximal policy optimization algorithm [J].

Qi, Xuan ;

Zhou, Tong ;

Wang, Cunsong ;

Peng, Xiaotian ;

Peng, Hao .

Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2025, 31 (03) :955-964

[25] Joint Power and Bandwidth Allocation for Internet of Vehicles Based on Proximal Policy Optimization Algorithm [J].

Xu, Sujie ;

Hu, Xin ;

Wang, Libing ;

Wang, Yin ;

Wang, Weidong .

2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, :1352-1357

[26] Proximal Policy Optimization Algorithm for Multi-objective Disassembly Line Balancing Problems [J].

Zhong, ZhaoKai ;

Guo, XiWang ;

Zhou, MengChu ;

Wang, Jiacun ;

Qin, Shujin ;

Qi, Liang .

2022 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE, ANZCC, 2022, :207-212

[27] Path Planning for Multi-UAV Based on Improved Proximal Policy Optimization Algorithm [J].

Zhu, Wenya ;

Fang, Wenxing ;

Su, Yanxu .

39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, :1895-1899

[28] Proximal Policy Optimization with Entropy Regularization [J].

Shen, Yuqing .

2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, :380-383

[29] Authentic Boundary Proximal Policy Optimization [J].

Cheng, Yuhu ;

Huang, Longyang ;

Wang, Xuesong .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) :9428-9438

[30] Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games [J].

Kristensen, Jeppe Theiss ;

Burelli, Paolo .

PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2020, 2020,

← 1 2 3 4 5 →