Opponent portrait for multiagent reinforcement learning in competitive environment

被引：15

作者：

Ma, Yuxi ^{[1
]}

Shen, Meng ^{[2
]}

Zhao, Yuhang ^{[1
]}

Li, Zhao ^{[1
]}

Tong, Xiaoyao ^{[1
]}

Zhang, Quanxin ^{[1
]}

Wang, Zhi ^{[3
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Sch Cyberspace Sci & Technol, Beijing, Peoples R China

[3] Nankai Univ, Coll Cyber Sci, Tianjin 300071, Peoples R China

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | 2021年 / 36卷 / 12期

基金：

中国国家自然科学基金;

关键词：

deep reinforcement learning; intention inference; knowledge graph; multiagent system; opponent modeling;

D O I：

10.1002/int.22594

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing investigations of opponent modeling and intention inferencing cannot make clear descriptions and practical explanations of the opponent's behaviors and intentions, which may inevitably limit the applicability of them. In this work, we propose a novel approach for opponent's policy explanation and intention inference based on the behavioral portrait of opponent. Specifically, we use the multiagent deep deterministic policy gradients (MADDPG) algorithm to train the agent and opponent in the competitive environment, and collect the behavioral data of opponent based on agent's observations. Then we perform pattern segmentation and extract the opponent's behavior events via Toeplitz inverse covariance-based clustering (TICC) algorithm; hence the opponent's behavior data can be encoded into a knowledge graph, named opponent's behavior knowledge graph (OKG). Based on this, we built a question-answer system (QA system) to query and match opponent historical information in OKG, so that the agent can obtain additional experience and gradually infer the intention of opponent with the episodes of iteration. We evaluate the proposed method on the competitive scenario in multiagent particle environment (MPE). Simulation results show that the agents are able to learn better policies with opponent portrait in competitive settings.

引用

页码：7461 / 7474

页数：14

共 35 条

[1] Bhatt V., 2021, 210302150 ARXIV
[2] A comprehensive survey of multiagent reinforcement learning
Busoniu, Lucian
Babuska, Robert
De Schutter, Bart
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
[3] Busoniu L, 2010, STUD COMPUT INTELL, V310, P183
[4] Davies I., 2020, 200603923 ARXIV
[5] Duan Y., 2016, RL2: Fast reinforcement learning via slow reinforcement learning
[6] Ganzfried S., 2011, P INT C AUTONOMOUS A, V2, P533
[7] Hadfield-Menell, 2017, ARXIV171102827
[8] Hadfield-Menell D, 2016, ADV NEUR IN, V29
[9] Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Hallac, David
Vare, Sagar
Boyd, Stephen
Leskovec, Jure
[J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 215 - 223
[10] A Note on the Alternating Direction Method of Multipliers
Han, Deren
Yuan, Xiaoming
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2012, 155 (01) : 227 - 238

← 1 2 3 4 →