Deep Reinforcement Learning for Humanoid Robot Dribbling

被引：4

作者：

Muzio, Alexandre F., V ^{[1
]}

Maximo, Marcos R. O. A. ^{[1
]}

Yoneyama, Takashi ^{[2
]}

机构：

[1] Aeronaut Inst Technol, Comp Sci Div, Autonomous Computat Syst Lab LAB SCA, Praca Marechal Eduardo Gomes 50, BR-12228900 Sao Jose Dos Campos, SP, Brazil

[2] Aeronaut Inst Technol, Elect Engn Div, Praca Marechal Eduardo Gomes 50, BR-12228900 Sao Jose Dos Campos, SP, Brazil

来源：

2020 XVIII LATIN AMERICAN ROBOTICS SYMPOSIUM, 2020 XII BRAZILIAN SYMPOSIUM ON ROBOTICS AND 2020 XI WORKSHOP OF ROBOTICS IN EDUCATION (LARS-SBR-WRE 2020) | 2020年

关键词：

D O I：

10.1109/lars/sbr/wre51543.2020.9307084

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Humanoid robot soccer is a very traditional competitive task that aims to push the boundaries of state-of-the-art robotics. One of the many challenges of playing soccer is walking and running while not losing balance. Deep Reinforcement Learning (DRL) has been used to solve complex continuous control problems such as those in robotics. In this work, we focused on learning humanoid robot behavior to dribble a ball against a single opponent. Instead of learning how to control joint commands directly, we adopt an approach where the learning agent interacts with a predefined walking engine. Using DRL model-free algorithms (namely, Deep Deterministic Policy Gradients, Trust Region Policy Optimization, and Proximal Policy Optimization), we effectively learn a high level policy that allows a humanoid robot to fulfill this task. Finally, the learned dribble policy was evaluated on a simulated Nao robot from the RoboCup 3D Soccer Simulation League. According to our results, the learned agent was able to surpass the handcoded behavior effectively used by the ITAndroids robotics team in the RoboCup competition.

引用

页码：246 / 251

页数：6

共 33 条

[1] Al-Shedivat M., 2017, ARXIV
[2] [Anonymous], 2012, P 26 AAAI C ART INT
[3] Bansal T, 2018, Arxiv, DOI arXiv:1710.03748
[4] Bengio Y., 2009, J AM PODIATRY ASS, P41, DOI [DOI 10.1145/1553374.15533802,5, 10.1145/1553374.1553380, DOI 10.1145/1553374.1553380]
[5] Dhariwal P., 2017, Openai baselines
[6] Duan Y, 2016, PR MACH LEARN RES, V48
[7] Gabel T, 2009, LECT NOTES COMPUT SC, V5399, P61, DOI 10.1007/978-3-642-02921-9_6
[8] Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)
Hansen, N
Muller, SD
Koumoutsakos, P
[J]. EVOLUTIONARY COMPUTATION, 2003, 11 (01) : 1 - 18
[9] Hausknecht M., 2016, P ICLR
[10] Kajita S, 2001, IROS 2001: PROCEEDINGS OF THE 2001 IEEE/RJS INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, P239, DOI 10.1109/IROS.2001.973365

← 1 2 3 4 →