Towards efficient long-horizon decision-making using automated structure search method of hierarchical reinforcement learning for edge artificial intelligence

被引:3
作者
Wu, Guanlin [1 ,2 ]
Bao, Weidong [2 ]
Cao, Jiang [1 ]
Zhu, Xiaomin [2 ]
Wang, Ji [2 ]
Xiao, Wenhua [1 ]
Liang, Wenqian [2 ]
机构
[1] Acad Mil Sci PLA, Beijing, Peoples R China
[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
IoT decision-making tasks; Hierarchical reinforcement learning; Embedded exploration and exploitation process; Synchronous training architecture; Adaptive evolutionary method;
D O I
10.1016/j.iot.2023.100951
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical reinforcement learning (HRL) is a promising approach for efficiently solving various long-horizon decision-making tasks in the Internet of Things (IoT) domain. However, HRL algorithms are known to rely on expert knowledge to preset an appropriate hierarchical structure for different IoT tasks, which leads to higher trial costs and limits its wider application. In this paper, we propose a new method called DHRL (Dynamic-Level Hierarchical Reinforce-ment Learning) and it is able to adaptively search for the optimal hierarchical structure while maintaining the generality of framework design. DHRL incorporates an embedded exploration and exploitation mechanism that effectively solves the challenges caused by dependence between different levels and achieves a balance between maximizing benefits and current evaluation accuracy. Nonetheless, the more exploration processes inevitably has a negative impact on the performance. To mitigate this influences, we propose a synchronous training architecture to support DHRL operating in a distributed and parallel manner, in which the adaptive evolutionary method is also introduced to accelerate the convergence. Extensive experimental evaluations are conducted to demonstrate the effectiveness of our theory and method.
引用
收藏
页数:18
相关论文
共 36 条
[1]  
[Anonymous], 2002, ICML
[2]  
Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[3]  
Berner Christopher, 2019, arXiv
[4]  
Bouneffouf D., 2012, 2012 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA), P657, DOI 10.1109/WAINA.2012.200
[5]  
Chen W., 2013, PROC INT C MACH LEAR, P151
[6]  
Chen X, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2123
[7]  
Czarnecki W.M., 2018, P INT C MACH LEARN
[8]   Hierarchical reinforcement learning with the MAXQ value function decomposition [J].
Dietterich, TG .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :227-303
[9]  
Duan Y, 2016, PR MACH LEARN RES, V48
[10]  
Eysenbach B., 2019, INT C LEARNING REPRE