Bioinspired actor-critic algorithm for reinforcement learning interpretation with Levy-Brown hybrid exploration strategy

被引：3

作者：

Wang, Xiao ^{[1
,2
]}

Li, Dazi ^{[1
,3
]}

机构：

[1] Beijing Univ Chem Technol, Beijing 100029, Peoples R China

[2] Beijing Univ Chem Technol, Lecturer Coll Informat Sci & Technol, Beijing, Peoples R China

[3] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100024, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 574卷

基金：

中国国家自然科学基金;

关键词：

Interpretable reinforcement learning; Levy motion; Pareto distribution; Actor-critic; AVOIDANCE; NETWORKS; GAME;

D O I：

10.1016/j.neucom.2024.127291

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Currently, reinforcement learning, the interpretability of the algorithm is a challenge. The lack of interpretability limits the use of reinforcement learning limited when facing agents in the physical world. To improve the interpretability of reinforcement learning, this study proposes a Levy-Brown hybrid strategy to improve the working of the traditional Actor-Critic algorithm. The proposed strategy is bioinspired from the Brown motion and Levy motion in nature; therefore, it can explain the process of data acquisition in the learning process from biological principles. The main idea of this new strategy is to map the Gaussian strategy to the biological Brown motion, and introduce the biological Levy strategy to improve the exploration efficiency. By combining the two strategies, it effectively takes advantage of the Levy strategy to improve exploration speed and the Brown strategy to improve exploration stability. The experiments demonstrate the advantages of the proposed Levy-Brown hybrid strategy, which effectively make best use of the advantages and overcomes the disadvantages of the two strategies.

引用

页数：16

共 44 条

[1] Analikwu CV, 2017, INT J INNOV COMPUT I, V13, P1855, DOI 10.24507/ijicic.13.06.1855
[2] Arkin RC., 1998, Behavior-Based Robotics
[3] Chuang Y.-N., 2023, Efficient XAI Techniques: A Taxonomic Survey
[4] Fundamental Limitations in Performance and Interpretability of Common Planar Rigid-Body Contact Models
Fazeli, Nima
Zapolsky, Samuel
Drumwright, Evan
Rodriguez, Alberto
[J]. ROBOTICS RESEARCH, 2020, 10 : 555 - 571
[5] Explaining Explanations: An Overview of Interpretability of Machine Learning
Gilpin, Leilani H.
Bau, David
Yuan, Ben Z.
Bajwa, Ayesha
Specter, Michael
Kagal, Lalana
[J]. 2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, : 80 - 89
[6] Fuzzy policy reinforcement learning in cooperative multi-robot systems
Gu, Dongbing
Yang, Erfu
[J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2007, 48 (01) : 7 - 22
[7] Guo Z., 2022, Journal of Uncertain Systems, V15, DOI 10.1142/S1752890922300011
[8] Reinforcement Learning for Neural Architecture Search in Hyperspectral Unmixing
Han, Zhu
Hong, Danfeng
Gao, Lianru
Roy, Swalpa Kumar
Zhang, Bing
Chanussot, Jocelyn
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[9] Hansen S., 2016, Using deep Q-learning to control optimization hyperparameters
[10] Artificial intelligence-based radiotherapy machine parameter optimization using reinforcement learning
Hrinivich, William Thomas
Lee, Junghoon
[J]. MEDICAL PHYSICS, 2020, 47 (12) : 6140 - 6150

← 1 2 3 4 5 →