Human-Inspired Meta-Reinforcement Learning Using Bayesian Knowledge and Enhanced Deep Q-Network

被引：0

作者：

Ho, Joshua ^{[1
,2
,3
]}

Wang, Chien-Min ^{[1
,2
]}

King, Chung-Ta ^{[1
,3
,4
]}

You, Yi-Hsin ^{[2
,5
]}

Feng, Chi-Wei ^{[2
,4
]}

机构：

[1] Acad Sinica, TIGP SNHCC, Taipei, Taiwan

[2] Acad Sinica, Inst Informat Sci, 128 Acad Rd,Sect 2, Taipei 115, Taiwan

[3] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu, Taiwan

[4] Natl Tsing Hua Univ, Dept Comp Sci, 101 Sect 2,Kuang Fu Rd, Hsinchu 300044, Taiwan

[5] Natl Taiwan Univ, Dept Comp Sci, 1 Sec 4,Roosevelt Rd, Taipei 106319, Taiwan

来源：

INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING | 2024年 / 18卷 / 04期

关键词：

Human-inspired model; meta-reinforcement learning; bayesian knowledge; adaptation and generalization; deep Q-network; replay buffer;

D O I：

10.1142/S1793351X2444001X

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Over the last decades, there has been growing interest in research in multiple and interdisciplinary fields of human-AI computing. In particular, approaches integrating human's perspective and design with reinforcement learning (RL) have received more attention. However, the current research on RL may need to consider its enhancement from human-inspired approaches further. In this work, we focus on enabling a meta-reinforcement learning (meta-RL) agent to achieve adaptation and generalization, according to modeling Markov Decision Processes (MDP) using Bayesian knowledge and analysis. By introducing a novel framework called human-inspired meta-RL (HMRL), we incorporate the agent performing resilient actions to leverage the dynamic dense reward based on the knowledge and prediction of a Bayesian analysis. The proposed framework can make the agent learn generalization and prevent the agent from failing catastrophically. The experimental results show that our approach helps the agent reduce computational costs with learning adaptation. In addition to the system design, we have also extended further algorithmic improvement based on learning within a deep Q-network (DQN) implementations for more complicated future tasks, which compared replay buffers to possibly enhance the optimization process. Finally, we conclude and anticipate that integrating human-inspired meta-RL can enable learning more formulations relating to robustness and scalability, leading to promising directions and more complex AI goals in the future.

引用

页码：547 / 569

页数：23

共 45 条

[1]

Ajay A., 2020, ARXIV

[2] Guidelines for Human-AI Interaction [J].

Amershi, Saleema ;

Weld, Dan ;

Vorvoreanu, Mihaela ;

Fourney, Adam ;

Nushi, Besmira ;

Collisson, Penny ;

Suh, Jina ;

Iqbal, Shamsi ;

Bennett, Paul N. ;

Inkpen, Kori ;

Teevan, Jaime ;

Kikin-Gil, Ruth ;

Horvitz, Eric .

CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,

[3]

Andrychowicz Marcin, 2017, Advances in neural information processing systems, V30

[4]

Beikmohammadi, 2023, ARXIV

[5]

Chalkiadakis G., 2004, AAMAS '04: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems. New York, V3, P1090, DOI DOI 10.1109/AAMAS.2004.74

[6] Reducing Computational Cost During Robot Navigation and Human-Robot Interaction with a Human-Inspired Reinforcement Learning Architecture [J].

Dromnelle, Remi ;

Renaudo, Erwan ;

Chetouani, Mohamed ;

Maragos, Petros ;

Chatila, Raja ;

Girard, Benoit ;

Khamassi, Mehdi .

INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2023, 15 (08) :1297-1323

[7]

Everitt T., 2018, ARXIV

[8] Interactive Reinforcement Learning from Imperfect Teachers [J].

Faulkner, Taylor A. Kessler ;

Thomaz, Andrea .

HRI '21: COMPANION OF THE 2021 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2021, :577-579

[9]

Finn C, 2017, PR MACH LEARN RES, V70

[10]

Frazier M., 2019, P AAAI C ART INT INT, V15, P146

← 1 2 3 4 5 →