Accelerating wargaming reinforcement learning by dynamic multi-demonstrator ensemble

被引：5

作者：

Dong, Liwei ^{[1
]}

Li, Ni ^{[1
,2
]}

Yuan, Haitao ^{[1
]}

Gong, Guanghong ^{[1
]}

机构：

[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China

[2] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China

来源：

INFORMATION SCIENCES | 2023年 / 648卷

关键词：

Reinforcement learning; Wargaming; Decision-making; Expert demonstrations;

D O I：

10.1016/j.ins.2023.119534

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Reinforcement Learning (DRL) has become a promising technique to deal with tough wargaming decision-making problems. However, DRL suffers an inherent problem of low learning efficiency and it often requires massive cost of training steps, which may be alleviated with expert demonstrations in wargaming domains. Most learning methods with demonstrations generally treat the demonstration data from different expert demonstrators without distinction. Besides, a more appropriate and effective mechanism is highly needed to control sampling balance of expert-generated demonstration samples and agent-generated interaction ones. To tackle the two issues, this work proposes an improved approach to leverage expert demonstrations to further accelerate DRL. It innovatively extracts inherent diversity in multiple demonstrators by pre training agents individually from multiple demonstration sources, thereby producing a strong and initial ensemble model. In addition, a novel technique to evaluate the learning importance of each demonstrator is designed to dynamically tune sampling ratios of learning data in a more adaptive and effective manner. Through the evaluation on several classic game tasks and a typical wargaming scenario, our method shows superior performance over several state-of-the-art methods and significantly raises DRL's efficiency for typical wargaming decision-making applications.

引用

页数：19

共 47 条

[1]

Li AA, 2021, Arxiv, DOI arXiv:2102.03261

[2]

Appleget J., 2021, Simulation and Wargaming, P1, DOI 10.1002/9781119604815.ch1

[3] A survey of inverse reinforcement learning: Challenges, methods and progress [J].

Arora, Saurabh ;

Doshi, Prashant .

ARTIFICIAL INTELLIGENCE, 2021, 297 (297)

[4]

Boron J, 2020, IEEE CONF COMPU INTE, P728, DOI 10.1109/CoG47356.2020.9231609

[5]

Caffrey M.B., 2019, On Wargaming: How Wargames Have Shaped History and How They May Shape the Future

[6]

Chao L., 2020, 12 INT C INT HUM MAC, P104, DOI [10.1109/IHMSC49165.2020.10102, DOI 10.1109/IHMSC49165.2020.10102]

[7]

Cui Q., 2023, 36 ANN C LEARNING TH, P2651

[8]

Ding Zihan., 2020, DEEP REINFORCEMENT L, DOI [DOI 10.1007/978-981-15-4095-0_7, https://doi.org/10.1007/978-981-15-4095-07]

[9]

Fedus W, 2020, PR MACH LEARN RES, V119

[10]

Fujimoto S, 2019, PR MACH LEARN RES, V97

← 1 2 3 4 5 →