Exploring Multi-action Relationship in Reinforcement Learning

被引:8
|
作者
Wang, Han [1 ]
Yu, Yang [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
关键词
D O I
10.1007/978-3-319-42911-3_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many real-world reinforcement learning problems, an agent needs to control multiple actions simultaneously. To learn under this circumstance, previously, each action was commonly treated independently with other. However, these multiple actions are rarely independent in applications, and it could be helpful to accelerate the learning if the underlying relationship among the actions is utilized. This paper explores multi-action relationship in reinforcement learning. We propose to learn the multi-action relationship by enforcing a regularization term capturing the relationship. We incorporate the regularization term into the least-square policy-iteration and the temporal-difference methods, which result efficiently solvable convex learning objectives. The proposed methods are validated empirically in several domains. Experiment results show that incorporating multi-action relationship can effectively improve the learning performance.
引用
收藏
页码:574 / 587
页数:14
相关论文
共 50 条
  • [41] Explainable Action Advising for Multi-Agent Reinforcement Learning
    Guo, Yue
    Campbell, Joseph
    Stepputtis, Simon
    Li, Ruiyu
    Hughes, Dana
    Fang, Fei
    Sycara, Katia
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 5515 - 5521
  • [42] Action Markets in Deep Multi-Agent Reinforcement Learning
    Schmid, Kyrill
    Belzner, Lenz
    Gabor, Thomas
    Phan, Thomy
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 240 - 249
  • [43] ACTION DISCOVERY FOR SINGLE AND MULTI-AGENT REINFORCEMENT LEARNING
    Banerjee, Bikramjit
    Kraemer, Landon
    ADVANCES IN COMPLEX SYSTEMS, 2011, 14 (02): : 279 - 305
  • [44] Simultaneous particle tracking in multi-action motion models with synthesized paths
    Ukita, Norimichi
    IMAGE AND VISION COMPUTING, 2013, 31 (6-7) : 448 - 459
  • [45] Abrupt-joins as a resource for the production of multi-unit, multi-action turns
    Local, J
    Walker, G
    JOURNAL OF PRAGMATICS, 2004, 36 (08) : 1375 - 1403
  • [46] A vibration segmentation approach for the multi-action system of numerical control turret
    Wei Hu
    Zhaojun Yang
    Chuanhai Chen
    Bo Sun
    Qunya Xie
    Signal, Image and Video Processing, 2022, 16 : 489 - 496
  • [47] Multi-goal Reinforcement Learning via Exploring Successor Matching
    Feng, Xiaoyun
    2022 IEEE CONFERENCE ON GAMES, COG, 2022, : 401 - 408
  • [48] Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
    Xiong, Guojun
    Wang, Shufan
    Li, Jian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [49] An integer programming method for the design of multi-criteria multi-action conservation plans
    Alvarez-Miranda, Eduardo
    Salgado-Rojas, Jose
    Hermoso, Virgilio
    Garcia-Gonzalo, Jordi
    Weintraub, Andres
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2020, 92 (92):
  • [50] Zero-determinant Strategies for Multi-player Multi-action Iterated Games
    He, Xiaofan
    Dai, Huaiyu
    Ning, Peng
    Dutta, Rudra
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (03) : 311 - 315