Implementing action mask in proximal policy optimization (PPO) algorithm

被引:33
作者
Tang, Cheng-Yen [1 ]
Liu, Chien-Hung [1 ]
Chen, Woei-Kae [1 ]
You, Shingchern D. [1 ]
机构
[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan
来源
ICT EXPRESS | 2020年 / 6卷 / 03期
关键词
PPO; Invalid action; Reinforcement learning;
D O I
10.1016/j.icte.2020.05.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable. (C) 2020 The Korean Institute of Communications and Information Sciences (KICS). Publishing services by Elsevier B.V.
引用
收藏
页码:200 / 203
页数:4
相关论文
共 50 条
  • [1] PPO-CMA: PROXIMAL POLICY OPTIMIZATION WITH COVARIANCE MATRIX ADAPTATION
    Hamalainen, Perttu
    Babadi, Amin
    Ma, Xiaoxiao
    Lehtinen, Jaakko
    PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [2] Entropy adjustment by interpolation for exploration in Proximal Policy Optimization (PPO)
    Boudlal, Ayoub
    Khafaji, Abderahim
    Elabbadi, Jamal
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [3] PPO-RM: Proximal Policy Optimization Based Route Mutation for Multimedia Services
    Shen, Jiahao
    Zhang, Tao
    Zhang, Bingchi
    Ji, Weixiao
    Kuang, Xiaohui
    Xu, Changqiao
    IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 35 - 40
  • [4] GAA-PPO: A novel graph adversarial attack method by incorporating proximal policy optimization
    Yang, Shuxin
    Chang, Xiaoyang
    Zhu, Guixiang
    Cao, Jie
    Qin, Weiping
    Wang, Youquan
    Wang, Zhendong
    NEUROCOMPUTING, 2023, 557
  • [5] PPO-ABR: Proximal Policy Optimization based Deep Reinforcement Learning for Adaptive BitRate streaming
    Naresh, Mandan
    Saxena, Paresh
    Gupta, Manik
    2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 199 - 204
  • [6] A proximal policy optimization with curiosity algorithm for virtual drone navigation
    Das, Rupayan
    Khan, Angshuman
    Paul, Gunjan
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (01):
  • [7] FPGA implementation of Proximal Policy Optimization algorithm for Edge devices with application to Agriculture Technology
    Waseem S.M.
    Roy S.K.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (10) : 14141 - 14152
  • [8] PPO-TA: Adaptive task allocation via Proximal Policy Optimization for spatio-temporal crowdsourcing
    Zhao, Bingxu
    Dong, Hongbin
    Wang, Yingjie
    Pan, Tingwei
    KNOWLEDGE-BASED SYSTEMS, 2023, 264
  • [9] Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm
    Jia R.-D.
    Ning W.-B.
    He D.-K.
    Chu F.
    Wang F.-L.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (11): : 3075 - 3082
  • [10] Comparison of Empirical and Reinforcement Learning (RL)-Based Control Based on Proximal Policy Optimization (PPO) for Walking Assistance: Does AI Always Win?
    Drewing, Nadine
    Ahmadi, Arjang
    Xiong, Xiaofeng
    Sharbafi, Maziar Ahmad
    BIOMIMETICS, 2024, 9 (11)