Interactive Reinforcement Learning Strategy

被引：1

作者：

Shi, Zhenjie ^{[1
]}

Ma, Wenming ^{[1
]}

Yin, Shuai ^{[1
]}

Zhang, Hailiang ^{[1
]}

Zhao, Xiaofan ^{[1
]}

机构：

[1] Yantai Univ, Sch Comp & Control Engn, Yantai, Peoples R China

来源：

2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021) | 2021年

关键词：

Reinforcement learning; interactive learning; path planning; Q-learning;

D O I：

10.1109/SWC50871.2021.00075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The birth of AlphaGo has set off a new wave of reinforcement learning technology. Reinforcement learning has become one of the most popular directions in the field of artificial intelligence. Its essence is the continuous integration and upgrading of various machine learning methods, and the agents continue to trial and error and obtain cumulative rewards. Q-learning is the most commonly used method in reinforcement learning, but it itself has many problems such as less early information, long learning time, low learning efficiency, and repeated trial and error. Therefore, Q-learning cannot be directly applied to the real environment. In response to this problem, the reinforcement learning discussed by the author is an interactive learning method that combines voice commands and Q-learning. This method uses part of the interaction between the agent and the human voice to find a larger target range in the early stage of learning. Then narrow the search range in turn, which can guide the agent to quickly achieve the learning effect and change the blindness of learning. Simulation experiments show that compared with the standard Q-learning algorithm, the proposed algorithm not only improves the convergence speed, shortens the learning time, but also reduces the number of collisions, enabling the agent to quickly find a better collision-free path.

引用

页码：507 / 512

页数：6

共 50 条

[31] A Robust Exploration Strategy in Reinforcement Learning Based on Temporal Difference Error
Hajar, Muhammad Shadi
Kalutarage, Harsha
Al-Kadri, M. Omar
AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 789 - 799
[32] An Adaptive Strategy Selection Method With Reinforcement Learning for Robotic Soccer Games
Shi, Haobin
Lin, Zhiqiang
Hwang, Kao-Shing
Yang, Shike
Chen, Jialin
IEEE ACCESS, 2018, 6 : 8376 - 8386
[33] Influence zones:: A strategy to enhance reinforcement learning
Braga, Arthur Plinio de S.
Araujo, Aluizio F. R.
NEUROCOMPUTING, 2006, 70 (1-3) : 21 - 34
[34] Learning a Diagnostic Strategy on Medical Data With Deep Reinforcement Learning
Zhu, Mengxiao
Zhu, Haogang
IEEE ACCESS, 2021, 9 : 84122 - 84133
[35] The Study on Interactive Learning Strategy in Digital Campus
Zhang Ling
Liu Xiumin
PROCEEDING OF 2012 INTERNATIONAL SYMPOSIUM - EDUCATIONAL RESEARCH AND EDUCATIONAL TECHNOLOGY, 2012, : 11 - +
[36] Self-Augmenting Strategy for Reinforcement Learning
Huang, Xin
Xiao, Shuangjiu
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2017), 2017, : 1 - 4
[37] The Advance of Reinforcement Learning and Deep Reinforcement Learning
Lyu, Le
Shen, Yang
Zhang, Sicheng
2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 644 - 648
[38] Interactive Video Corpus Moment Retrieval using Reinforcement Learning
Ma, Zhixin
Ngo, Chong Wah
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[39] A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning
Seurin, Mathieu
Strub, Florian
Preux, Philippe
Pietquin, Olivier
INTERSPEECH 2020, 2020, : 4323 - 4327
[40] How to recommend preferable solutions of a user in interactive reinforcement learning?
Yamaguchi, Tomohiro
Nishimura, Takuma
2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2008, : 1968 - 1973

← 1 2 3 4 5 →