Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization

被引：58

作者：

Chen, Long ^{[1
]}

Hu, Bin ^{[1
]}

Guan, Zhi-Hong ^{[1
]}

Zhao, Lian ^{[2
]}

Shen, Xuemin ^{[3
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Ryerson Univ, Dept Elect Comp & Biomed Engn, Toronto, ON M5B 2K3, Canada

[3] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2022年 / 33卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Routing; Optimization; Task analysis; Heuristic algorithms; Spread spectrum communication; Routing protocols; Training; Adaptive routing; metapolicy gradient; multiagent; reinforcement learning (RL); NETWORKS; CHALLENGES; ENERGY;

D O I：

10.1109/TNNLS.2021.3070584

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov decision process (MDP). Since the MDP of a network node is not only affected by its neighboring nodes' policies but also the network traffic demand, it becomes a multitask learning problem. Inspired by recent success of RL and metalearning, we propose two novel model-free multiagent RL algorithms, named multiagent proximal policy optimization (MAPPO) and multiagent metaproximal policy optimization (meta-MAPPO), to optimize the network performances under fixed and time-varying traffic demand, respectively. A practicable distributed implementation framework is designed based on the separability of exploration and exploitation in training MAPPO. Compared with the existing routing optimization policies, our simulation results demonstrate the excellent performances of the proposed algorithms.

引用

页码：5374 / 5386

页数：13

共 49 条

[31]

Roughan M., 2003, P 3 ACM SIGCOMM C IN, P248

[32] Spatio-Temporal Compressive Sensing and Internet Traffic Matrices (Extended Version) [J].

Roughan, Matthew ;

Zhang, Yin ;

Willinger, Walter ;

Qiu, Lili .

IEEE-ACM TRANSACTIONS ON NETWORKING, 2012, 20 (03) :662-676

[33] Deep learning in neural networks: An overview [J].

Schmidhuber, Juergen .

NEURAL NETWORKS, 2015, 61 :85-117

[34]

Schulman John, 2017, Proximal policy optimization algorithms

[35] Measuring ISP topologies with rocketfuel [J].

Spring, N ;

Mahajan, R ;

Wetherall, D ;

Anderson, T .

IEEE-ACM TRANSACTIONS ON NETWORKING, 2004, 12 (01) :2-16

[36]

Stampa G, 2017, DEEP REINFORCEMENT L

[37]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

[38]

Thaler D., 2000, MULTIPATH ISSUES UNI

[39] Learning To Route [J].

Valadarsky, Asaf ;

Schapira, Michael ;

Shahaf, Dafna ;

Tamar, Aviv .

HOTNETS-XVI: PROCEEDINGS OF THE 16TH ACM WORKSHOP ON HOT TOPICS IN NETWORKS, 2017, :185-191

[40] Diagnostic and therapeutic challenges in a case of amikacin-resistant Nocardia keratitis [J].

Wang, Jiawei ;

Valiente-Soriano, Francisco J. ;

Nadal-Nicolas, Francisco M. ;

Rovere, Giuseppe ;

Chen, Shida ;

Huang, Wenbin ;

Agudo-Barriuso, Marta ;

Jonas, Jost B. ;

Vidal-Sanz, Manuel ;

Zhang, Xiulan .

ACTA OPHTHALMOLOGICA, 2017, 95 (01) :E10-E21

← 1 2 3 4 5 →