Towards end-to-end formation control for robotic fish via deep reinforcement learning with non-expert imitation?

被引:14
作者
Sun, Yihao [1 ]
Yan, Chao [1 ]
Xiang, Xiaojia [1 ]
Zhou, Han [1 ]
Tang, Dengqing [1 ]
Zhu, Yi [2 ]
机构
[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China
[2] Guangdong Ocean Univ, Shenzhen Inst, Ocean Intelligence Technol Ctr, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Formation control; Robotic fish; Imitation learning;
D O I
10.1016/j.oceaneng.2023.113811
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
Collaboration of multiple robotic fish can accomplish various underwater tasks effectively. However, control-ling robotic fish to maintain a specific formation remains a huge challenge, especially in complex and changing flow fields. This paper presents an end-to-end formation control approach in the leader-follower topology by combining deep reinforcement learning and imitation learning. First, we build a high-fidelity environment based on computational fluid dynamics (CFD) to generate samples for training the formation controller. In this environment, we maneuver the robotic fish by adjusting the maximum swing of its tail. Then, we model the formation control problem as a Markov decision process (MDP), where a compound reward function is tailored to guide the training. To improve the learning efficiency of the deep reinforcement learning (DRL) based controller, we propose a novel DRL algorithm on the top of deep Q-networks (DQN) and behavior cloning, which we call dueling double DQN (D3QN) with imitation. Combining with the designed imitation -based action selection strategy, this algorithm significantly reduce the blindness of agent exploration at the beginning of training. A series of experiments demonstrate the advantages of the proposed algorithm in terms of control accuracy, training efficiency, as well as generalization ability for different formation configurations.
引用
收藏
页数:11
相关论文
共 34 条
[1]   CPG-based autonomous swimming control for multi-tasks of a biomimetic robotic fish [J].
Bal, Cafer ;
Koca, Gonca Ozmen ;
Korkmaz, Deniz ;
Akpolat, Zuhtu Hakan ;
Ay, Mustafa .
OCEAN ENGINEERING, 2019, 189
[2]   Implicit coordination for 3D underwater collective behaviors in a fish-inspired robot swarm [J].
Berlinger, Florian ;
Gauci, Melvin ;
Nagpal, Radhika .
SCIENCE ROBOTICS, 2021, 6 (50)
[3]  
Brown C., 2011, Fish Cognition and Behavior, V21
[4]  
Chen L, 2022, IEEE T IND ELECTRON
[5]   Leader-follower formation control of nonholonomic mobile robots with input constraints [J].
Consolini, Luca ;
Morbidi, Fabio ;
Prattichizzo, Domenico ;
Tosques, Mario .
AUTOMATICA, 2008, 44 (05) :1343-1349
[6]   Hydrodynamical effect of parallelly swimming fish using computational fluid dynamics method [J].
Doi, Keisuke ;
Takagi, Tsutomu ;
Mitsunaga, Yasushi ;
Torisawa, Shinsuke .
PLOS ONE, 2021, 16 (05)
[7]  
Han Hong-Fei, 2013, J HARBIN I TECHNOL, V5
[8]   Hydrodynamics of a tandem fish school with asynchronous undulation of individuals [J].
Khalid, Muhammad Saif Ullah ;
Akhtar, Imran ;
Dong, Haibo .
JOURNAL OF FLUIDS AND STRUCTURES, 2016, 66 :19-35
[9]   Fish can save energy via proprioceptive sensing [J].
Li, Liang ;
Liu, Danshi ;
Deng, Jian ;
Lutz, Matthew J. ;
Xie, Guangming .
BIOINSPIRATION & BIOMIMETICS, 2021, 16 (05)
[10]  
Li L, 2020, NAT COMMUN, V11, DOI [10.1038/s41467-020-19086-0, 10.1038/s41467-020-18816-8]