Feedback Decision Transformer: Offline Reinforcement Learning With Feedback

被引:0
作者
Giladi, Liad [1 ]
Katz, Gilad [1 ]
机构
[1] Ben Gurion Univ Negev, Beer Sheva, Israel
来源
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023 | 2023年
关键词
Deep Reinforcement Learning; Offline Reinforcement Learning; RL with lltiman Feedback;
D O I
10.1109/ICDM58522.2023.00120
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent trajectory optimization methods for dline reinforcement learning (RL) define the problem as one of conditional-sequence policy modeling. One of these methods is Decision Transformer (DT), a Transformer-based trajectory optimization approach that achieved competitive results with the current state-of-the-art. Despite its high capabilities, DT underperforms when the training data does not contain full trajectories, or when the recorded behavior does not offer sufficient coverage of the states-actions space. We propose Feedback Decision Transformer (FDT), a data -driven approach that uses limited amounts of high-quality feedback at critical states to significantly improve DT's performance. Our approach analyzes and estimates the Q-function across the states -actions space, and identifies areas where feedback is likely to be most impactful. Next, we integrate this feedback into our model, and use it to improve our model's performance. Extensive evaluation and analysis on four Atari games show that FDT significantly outperforms DT in multiple setups and configurations.
引用
收藏
页码:1037 / 1042
页数:6
相关论文
共 21 条
[1]  
Agarwal R, 2020, PR MACH LEARN RES, V119
[2]  
Amir D, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P1168
[3]  
Bellemare MG, 2017, PR MACH LEARN RES, V70
[4]  
Chen LL, 2021, ADV NEUR IN, V34
[5]  
Christiano PF, 2017, ADV NEUR IN, V30
[6]  
Correia A, 2022, Arxiv, DOI arXiv:2209.10447
[7]  
Gottesman O, 2020, PR MACH LEARN RES, V119
[8]  
Hu SC, 2023, Arxiv, DOI arXiv:2303.03747
[9]  
Huang SH, 2018, IEEE INT C INT ROBOT, P3929, DOI 10.1109/IROS.2018.8593649
[10]  
Janner M, 2021, ADV NEUR IN, V34