Implementing action mask in proximal policy optimization (PPO) algorithm

被引:33
作者
Tang, Cheng-Yen [1 ]
Liu, Chien-Hung [1 ]
Chen, Woei-Kae [1 ]
You, Shingchern D. [1 ]
机构
[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan
来源
ICT EXPRESS | 2020年 / 6卷 / 03期
关键词
PPO; Invalid action; Reinforcement learning;
D O I
10.1016/j.icte.2020.05.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. (C) 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V.
引用
收藏
页码:200 / 203
页数:4
相关论文
共 50 条
  • [21] An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm
    Kim, Hye-Young
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (02): : 297 - 305
  • [22] Proximal Policy Optimization with Entropy Regularization
    Shen, Yuqing
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 380 - 383
  • [23] Authentic Boundary Proximal Policy Optimization
    Cheng, Yuhu
    Huang, Longyang
    Wang, Xuesong
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9428 - 9438
  • [24] Joint Power and Bandwidth Allocation for Internet of Vehicles Based on Proximal Policy Optimization Algorithm
    Xu, Sujie
    Hu, Xin
    Wang, Libing
    Wang, Yin
    Wang, Weidong
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 1352 - 1357
  • [25] Control of conventional continuous thickeners via proximal policy optimization
    Silva, Jonathan R.
    Euzebio, Thiago A. M.
    Braga, Marcio F.
    MINERALS ENGINEERING, 2024, 214
  • [26] Pressure control of Once-through steam generator using Proximal policy optimization algorithm
    Li, Cheng
    Yu, Ren
    Yu, Wenmin
    Wang, Tianshu
    ANNALS OF NUCLEAR ENERGY, 2022, 175
  • [27] An LSTM-based hybrid proximal policy optimization spectrum access algorithm in vehicular network
    Kang, Lin
    Chen, Junjie
    Wang, Jie
    Wei, Yaqi
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2024, 17 (03) : 486 - 502
  • [28] Image captioning via proximal policy optimization
    Zhang, Le
    Zhang, Yanshuo
    Zhao, Xin
    Zou, Zexiao
    IMAGE AND VISION COMPUTING, 2021, 108
  • [29] Proximal Policy Optimization with Relative Pearson Divergence
    Kobayashi, Taisuke
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8416 - 8421
  • [30] Improving proximal policy optimization with alpha divergence
    Xu, Haotian
    Yan, Zheng
    Xuan, Junyu
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2023, 534 : 94 - 105