Reinforcement Learning From Hierarchical Critics

被引:7
作者
Cao, Zehong [1 ]
Lin, Chin-Teng [2 ]
机构
[1] Univ South Australia, STEM, Adelaide, SA 5095, Australia
[2] Univ Technol Sydney, Australian Artificial Intelligence Inst AAII, Sch Comp Sci, Sydney, NSW 2007, Australia
基金
澳大利亚研究理事会;
关键词
Task analysis; Training; Sports; Linear programming; Games; Reinforcement learning; Optimization; Competition; critics; hierarchy; reinforcement learning (RL);
D O I
10.1109/TNNLS.2021.3103642
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we investigate the use of global information to speed up the learning process and increase the cumulative rewards of reinforcement learning (RL) in competition tasks. Within the framework of actor-critic RL, we introduce multiple cooperative critics from two levels of a hierarchy and propose an RL from the hierarchical critics (RLHC) algorithm. In our approach, each agent receives value information from local and global critics regarding a competition task and accesses multiple cooperative critics in a top-down hierarchy. Thus, each agent not only receives low-level details, but also considers coordination from higher levels, thereby obtaining global information to improve the training performance. Then, we test the proposed RLHC algorithm against a benchmark algorithm, that is, proximal policy optimization (PPO), under four experimental scenarios consisting of tennis, soccer, banana collection, and crawler competitions within the Unity environment. The results show that RLHC outperforms the benchmark on these four competitive tasks.
引用
收藏
页码:1066 / 1073
页数:8
相关论文
共 18 条
  • [1] Ahilan S., 2019, ARXIV190108492
  • [2] Busoniu L, 2010, STUD COMPUT INTELL, V310, P183
  • [4] Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
  • [5] Juliani A., 2018, ARXIV180902627
  • [6] Reinforcement learning: A survey
    Kaelbling, LP
    Littman, ML
    Moore, AW
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 237 - 285
  • [7] Levy A., 2018, P INT C LEARN REPR, P1
  • [8] Littman M. L., 2001, Cognitive Systems Research, V2, P55, DOI 10.1016/S1389-0417(01)00015-8
  • [9] Lowe R, 2017, ADV NEUR IN, V30
  • [10] Mnih V, 2016, PR MACH LEARN RES, V48