Interactive Reinforcement Learning Strategy

被引:1
作者
Shi, Zhenjie [1 ]
Ma, Wenming [1 ]
Yin, Shuai [1 ]
Zhang, Hailiang [1 ]
Zhao, Xiaofan [1 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai, Peoples R China
来源
2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021) | 2021年
关键词
Reinforcement learning; interactive learning; path planning; Q-learning;
D O I
10.1109/SWC50871.2021.00075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The birth of AlphaGo has set off a new wave of reinforcement learning technology. Reinforcement learning has become one of the most popular directions in the field of artificial intelligence. Its essence is the continuous integration and upgrading of various machine learning methods, and the agents continue to trial and error and obtain cumulative rewards. Q-learning is the most commonly used method in reinforcement learning, but it itself has many problems such as less early information, long learning time, low learning efficiency, and repeated trial and error. Therefore, Q-learning cannot be directly applied to the real environment. In response to this problem, the reinforcement learning discussed by the author is an interactive learning method that combines voice commands and Q-learning. This method uses part of the interaction between the agent and the human voice to find a larger target range in the early stage of learning. Then narrow the search range in turn, which can guide the agent to quickly achieve the learning effect and change the blindness of learning. Simulation experiments show that compared with the standard Q-learning algorithm, the proposed algorithm not only improves the convergence speed, shortens the learning time, but also reduces the number of collisions, enabling the agent to quickly find a better collision-free path.
引用
收藏
页码:507 / 512
页数:6
相关论文
共 50 条
  • [31] A Robust Exploration Strategy in Reinforcement Learning Based on Temporal Difference Error
    Hajar, Muhammad Shadi
    Kalutarage, Harsha
    Al-Kadri, M. Omar
    AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 789 - 799
  • [32] An Adaptive Strategy Selection Method With Reinforcement Learning for Robotic Soccer Games
    Shi, Haobin
    Lin, Zhiqiang
    Hwang, Kao-Shing
    Yang, Shike
    Chen, Jialin
    IEEE ACCESS, 2018, 6 : 8376 - 8386
  • [33] Influence zones:: A strategy to enhance reinforcement learning
    Braga, Arthur Plinio de S.
    Araujo, Aluizio F. R.
    NEUROCOMPUTING, 2006, 70 (1-3) : 21 - 34
  • [34] Learning a Diagnostic Strategy on Medical Data With Deep Reinforcement Learning
    Zhu, Mengxiao
    Zhu, Haogang
    IEEE ACCESS, 2021, 9 : 84122 - 84133
  • [35] The Study on Interactive Learning Strategy in Digital Campus
    Zhang Ling
    Liu Xiumin
    PROCEEDING OF 2012 INTERNATIONAL SYMPOSIUM - EDUCATIONAL RESEARCH AND EDUCATIONAL TECHNOLOGY, 2012, : 11 - +
  • [36] Self-Augmenting Strategy for Reinforcement Learning
    Huang, Xin
    Xiao, Shuangjiu
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2017), 2017, : 1 - 4
  • [37] The Advance of Reinforcement Learning and Deep Reinforcement Learning
    Lyu, Le
    Shen, Yang
    Zhang, Sicheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 644 - 648
  • [38] Interactive Video Corpus Moment Retrieval using Reinforcement Learning
    Ma, Zhixin
    Ngo, Chong Wah
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [39] A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning
    Seurin, Mathieu
    Strub, Florian
    Preux, Philippe
    Pietquin, Olivier
    INTERSPEECH 2020, 2020, : 4323 - 4327
  • [40] How to recommend preferable solutions of a user in interactive reinforcement learning?
    Yamaguchi, Tomohiro
    Nishimura, Takuma
    2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2008, : 1968 - 1973