Intrinsic Motivation in Model-Based Reinforcement Learning: A Brief Review

被引:0
作者
A. K. Latyshev [1 ]
A. I. Panov [2 ]
机构
[1] Moscow Institute of Physics and Technology, Moscow
[2] Federal Research Center “Computer Science and Control,” Russian Academy of Science, Moscow
[3] AIRI, Moscow
关键词
environment exploration; intrinsic motivation; reinforcement learning; world model;
D O I
10.3103/S0147688224700370
中图分类号
学科分类号
摘要
Abstract: The reinforcement learning approach offers a wide range of methods to the solution of problems of the control of intelligent agents. However, the problem of training an agent from sparse rewards remains relevant. One possible solution is to use methods of intrinsic motivation, an idea that comes from developmental psychology. Intrinsic motivation explains human behavior in the absence of extrinsic control stimulate. In this article, we reviewed the existing methods of determining intrinsic motivation based on the learned world model. The systematization of modern works in this field of study was proposed. This system consists of three classes of methods differing according to the application of the word model to the agent components of reward system, exploration policy, and intrinsic goals. We proposed a unified framework for describing the architecture of an agent using a world model and intrinsic motivation to improve learning. The prospects for development in this field of study were analyzed. © Allerton Press, Inc. 2024.
引用
收藏
页码:460 / 470
页数:10
相关论文
共 40 条
[1]  
Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., Petersen S., Beattie C., Sadik A., Antonoglou I., King H., Kumaran D., Wierstra D., Legg S., Hassabis D., Human-level control through deep reinforcement learning, Nature, 518, pp. 529-533, (2015)
[2]  
Skrynnik A., Staroverov A., Aitygulov E., Aksenov K., Davydov V., Panov A.I., Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations, Knowledge-Based Syst, 218, (2021)
[3]  
Skrynnik A., Staroverov A., Aitygulov E., Aksenov K., Davydov V., Panov A.I., Hierarchical deep Q-network from imperfect demonstrations in Minecraft, Cognit. Syst. Res, 65, pp. 74-78, (2021)
[4]  
Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A., Hubert T., Baker L., Lai M., Bolton A., Chen Y., Lillicrap T., Hui F., Sifre L., Van Den Driessche G., Graepel T., Hassabis D., Mastering the game of Go without human knowledge, Nature, 550, pp. 354-359, (2017)
[5]  
Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O., Proximal policy optimization algorithms, (2017)
[6]  
Staroverov A., Panov A.I., Hierarchical landmark policy optimization for visual indoor navigation, IEEE Access, 10, pp. 70447-70455, (2022)
[7]  
Moerland T.M., Broekens J., Plaat A., Jonker C.M., Model-Based Reinforcement Learning: A Survey, (2023)
[8]  
Zholus A., Ivchenkov, Ya., and Panov, A.I., Addressing task prioritization in model-based reinforcement learning, Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022, Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., and Tiumentsev, Y., Eds., Studies in Computational Intelligence, vol. 1064, Cham: Springer, 2022, pp. 19-30
[9]  
Ryan R.M., Deci E.L., Intrinsic and extrinsic motivations: Classic definitions and new directions, Contemp. Educ. Psychol, 25, pp. 54-67, (2000)
[10]  
Oudeyer P.-Y.Y., Kaplan F., What is intrinsic motivation? A typology of computational approaches, Front. Neurorobotics, 1, (2007)