A leading adaptive activation function for deep reinforcement learning

被引:0
作者
Jia, Chongjiexin [1 ,2 ,3 ]
Li, Tuanjie [2 ,3 ]
Dong, Hangjia [2 ,3 ]
Xie, Chao [1 ]
Peng, Wenxuan [1 ]
Ning, Yuming [2 ,3 ]
机构
[1] Shanghai Key Lab Spacecraft Mech, Shanghai 201108, Peoples R China
[2] Xidian Univ, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China
[3] Xidian Univ, State Key Lab Electromech Integrated Mfg High Perf, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Activation function; Reinforcement learning; Leading adaptive TanH; Deep Q -learning; Soft actor critic;
D O I
10.1016/j.jocs.2025.102608
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The activation function provides deep reinforcement learning with the capability to solve nonlinear problems. However, traditional activation functions have fixed parameter settings and cannot be adjusted adaptively based on constantly changing environmental conditions. This limitation frequently leads to slow convergence speed and inadequate performance of trained agents when confronted with highly complex nonlinear problems. This paper proposes a new method to enhance the ability of reinforcement learning to handle nonlinear problems. This method is mainly divided into two parts. Firstly, an activation function parameter initialization strategy based on environmental characteristics is adopted. Secondly, the Adam algorithm is used to dynamically update the activation function parameters. The activation function proposed in this paper is compared with both traditional activation functions and state-of-the-art activation functions through two experiments. Experimental data show that compared to ReLu, TanH, APA, and EReLu, its convergence speed in DQN tasks is improved by 3.89, 1.29, 0.981, and 2.173 times, respectively, and in SAC tasks, it is improved by 1.504, 1.013, 1.017, and 1.131 times, respectively. The results demonstrate that when the agent utilizes LaTanH as the activation function, it exhibits significant advantages in terms of convergence speed and performance and alleviates the problems of bilateral saturation and gradient vanishing.
引用
收藏
页数:11
相关论文
共 36 条
[1]  
Alexandridis KP, 2024, Arxiv, DOI arXiv:2407.08567
[2]  
Ashish R., 2024, Pattern Anal. Appl., V27
[3]   CNN-RNN architecture to calculate BPM from underwater ECG samples [J].
Beckingham, Thomas ;
Spencer, Joseph ;
McKay, Kirsty .
APPLIED INTELLIGENCE, 2023, 53 (18) :21156-21166
[4]  
Berradi Y., 2018, Learning and Optimization Algorithms: Theory and Applications
[5]   TanhSoft-Dynamic Trainable Activation Functions for Faster Learning and Better Performance [J].
Biswas, Koushik ;
Kumar, Sandeep ;
Banerjee, Shilpak ;
Pandey, Ashish Kumar .
IEEE ACCESS, 2021, 9 :120613-120623
[6]   Path Planning in Complex Environments Using Attention-Based Deep Deterministic Policy Gradient [J].
Chen, Jinlong ;
Jiang, Yun ;
Pan, Hongren ;
Yang, Minghao .
ELECTRONICS, 2024, 13 (18)
[7]  
Clevert DA, 2016, Arxiv, DOI arXiv:1511.07289
[8]   A Modular Robotic Arm Configuration Design Method Based on Double DQN with Prioritized Experience Replay [J].
Ding, Ziyan ;
Tang, Haijun ;
Wan, Haiying ;
Zhang, Chengxi ;
Sun, Ran .
SYMMETRY-BASEL, 2024, 16 (06)
[9]   Distributional Soft Actor-Critic for Decision-Making in On-Ramp Merge Scenarios [J].
Duan, Jingliang ;
Kong, Yiting ;
Jiao, Chunxuan ;
Guan, Yang ;
Li, Shengbo Eben ;
Chen, Chen ;
Nie, Bingbing ;
Li, Keqiang .
AUTOMOTIVE INNOVATION, 2024, 7 (03) :403-417
[10]   Proximal Policy Optimization With Policy Feedback [J].
Gu, Yang ;
Cheng, Yuhu ;
Chen, C. L. Philip ;
Wang, Xuesong .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (07) :4600-4610