LLM-Guided Reinforcement Learning for Interactive Environments

被引:0
作者
Yang, Fuxue [1 ]
Liu, Jiawen [1 ]
Li, Kan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
基金
北京市自然科学基金;
关键词
reinforcement learning; large language models; chain of thought; LANGUAGE;
D O I
10.3390/math13121932
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
We propose herein LLM-Guided Reinforcement Learning (LGRL), a novel framework that leverages large language models (LLMs) to decompose high-level objectives into a sequence of manageable subgoals in interactive environments. Our approach decouples high-level planning from low-level action execution by dynamically generating context-aware subgoals that guide the reinforcement learning (RL) agent. During training, intermediate subgoals-each associated with partial rewards-are generated based on the agent's current progress, providing fine-grained feedback that facilitates structured exploration and accelerates convergence. At inference, a chain-of-thought strategy is employed, enabling the LLM to adaptively update subgoals in response to evolving environmental states. Although demonstrated on a representative interactive setting, our method is generalizable to a wide range of complex, goal-oriented tasks. Experimental results show that LGRL achieves higher success rates, improved efficiency, and faster convergence compared to baseline approaches.
引用
收藏
页数:13
相关论文
共 26 条
[1]  
Ahn M, 2022, PR MACH LEARN RES, V205, P287
[2]  
Ahuja A., 2023, arXiv
[3]   Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J].
Anderson, Peter ;
Wu, Qi ;
Teney, Damien ;
Bruce, Jake ;
Johnson, Mark ;
Sunderhauf, Niko ;
Reid, Ian ;
Gould, Stephen ;
van den Hengel, Anton .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3674-3683
[4]  
Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[5]  
Brown TB, 2020, ADV NEUR IN, V33
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]  
Carta Thomas, 2023, PR MACH LEARN RES, V202
[8]  
Chevalier-Boisvert M., 2018, P INT C LEARN REPR V
[9]  
Chevalier-Boisvert M, 2023, ADV NEUR IN
[10]  
Hu Edward J, 2022, P 2022 INT C LEARN R