Policy Optimization with Augmented Value Targets for Generalization in Reinforcement Learning

被引:1
作者
Nafi, Nasik Muhammad [1 ]
Poggi-Corradini, Giovanni [1 ]
Hsu, William [1 ]
机构
[1] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
关键词
reinforcement learning; generalization; value estimation; overfitting; target augmentation; policy optimization;
D O I
10.1109/IJCNN54540.2023.10191507
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our work aims to improve the generalization performance of a reinforcement learning (RL) agent in unseen environment variations. The value function used in RL agents is frequently overfitted, leading to poor generalization performance. In this work, we argue that the task completion time is highly impacted by the varying environmental conditions, thus resulting in variation in episode lengths, and consequently, the value estimation. Therefore, learning from a limited variation of the environments, the agent gets biased to the value estimates that correspond to the observed episode lengths. To this end, we introduce Augmented Value Targets (AVaTar), which generates multiple value function targets considering the possibility of episode length variation and optimizes the value function with the average of these targets. We demonstrate that optimizing the average of the augmented targets is computationally more feasible than independently leveraging those pseudo-targets. Evaluations on the Procgen and Crafter benchmark show that our proposed approach is effective in generalizing the value estimates over unseen contexts and significantly outperforms the standard policy gradient algorithm Proximal Policy Optimization (PPO). Furthermore, comparison and integration with the recent generalization-specific approach UCB-DrAC indicate that AVaTar outperforms UCB-DrAC in most of the environments from Procgen.
引用
收藏
页数:8
相关论文
共 31 条
[1]  
Agarwal N, 2021, ADV NEUR IN
[2]  
Amit R, 2020, PR MACH LEARN RES, V119
[3]  
[Anonymous], 2017, ARXIV170302660
[4]  
Bertoin D., 2022, INT C LEARN REPR
[5]  
Cobbe K, 2021, PR MACH LEARN RES, V139
[6]  
Cobbe K, 2019, PR MACH LEARN RES, V97
[7]  
Cobbe K, 2020, PR MACH LEARN RES, V119
[8]  
Espeholt L, 2018, PR MACH LEARN RES, V80
[9]  
Farebrother Jesse., 2018, CoRR
[10]  
Grigsby J., 2020, MEASURING VISUAL GEN