An ML-Accelerated Framework for Large-Scale Constrained Traffic Engineering

被引:0
作者
Gu, Cheng [1 ,2 ]
Song, Xin [1 ]
Ng, Ben Hok [1 ]
Xiang, Qiao [3 ]
Guo, Zehua [4 ]
Li, Geng [1 ]
机构
[1] Huawei Technol, Beijing, Peoples R China
[2] Univ Toronto, Toronto, ON, Canada
[3] Xiamen Univ, Xiamen, Peoples R China
[4] Beijing Inst Technol, Zhengzhou Res Inst BIT, Zhejiang Lab, Beijing, Peoples R China
来源
2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024 | 2024年
关键词
Traffic Engineering; Machine Learning; Wide-area Network; OPTIMIZATION;
D O I
10.1109/ICDCS60910.2024.00014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traffic engineering (TE) mechanisms are crucial for achieving optimal levels of network performance over wide-area networks across geographically distributed datacenters. Existing work on traffic engineering formulated the challenges at hand as combinatorial optimization problems, which could take hours to compute on modern wide-area network topologies at the scale of thousands of nodes. To improve the performance of TE mechanisms, we introduce DeepTE, a new TE framework based on machine learning (ML) that is designed for the best possible scalability and performance, capable of completing the computation within milliseconds with networks involving thousands of nodes, and of generating near-optimal TE policies while guaranteeing that all constraints are satisfied. DeepTE is also designed with a distributed ML model architecture, which can be horizontally scaled up to multiple GPUs for even better performance. With real-world traffic matrices, our extensive array of performance evaluations of DeepTE on various network topologies and TE problems show that DeepTE is capable of producing policies within 5% of the optimal results while offering up to 100x performance improvements over state-of-the-art traffic engineering mechanisms.
引用
收藏
页码:47 / 58
页数:12
相关论文
共 38 条
[1]  
Abuzaid F, 2021, PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION, P175
[2]  
Almasan P., 2020, arXiv
[3]  
Anand R, 2017, J STAT MANAG SYST, V20, P623, DOI 10.1080/09720510.2017.1395182
[4]   Machine learning for combinatorial optimization: A methodological tour d'horizon [J].
Bengio, Yoshua ;
Lodi, Andrea ;
Prouvost, Antoine .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2021, 290 (02) :405-421
[5]   Is Machine Learning Ready for Traffic Engineering Optimization? [J].
Bernardez, Guillermo ;
Suarez-Varela, Jose ;
Lopez, Albert ;
Wu, Bo ;
Xiao, Shihan ;
Cheng, Xiangle ;
Barlet-Ros, Pere ;
Cabellos-Aparicio, Albert .
2021 IEEE 29TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP 2021), 2021,
[6]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[7]   TEAVAR: Striking the Right Utilization-Availability Balance in WAN Traffic Engineering [J].
Bogle, Jeremy ;
Bhatia, Nikhil ;
Ghobadi, Manya ;
Menache, Ishai ;
Bjorner, Nikolaj ;
Valadarsky, Asaf ;
Schapira, Michael .
SIGCOMM '19 - PROCEEDINGS OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2019, :29-43
[8]   Analysis of Model Parallelism for Distributed Neural Networks [J].
Castello, Adrian ;
Dolz, Manuel F. ;
Quintana-Orti Jose Duato, Enrique S. .
EUROMPI'19: PROCEEDINGS OF THE 26TH EUROPEAN MPI USERS' GROUP MEETING, 2019,
[9]  
Chen DL, 2020, AAAI CONF ARTIF INTE, V34, P3438
[10]   AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization [J].
Chen, Li ;
Lingys, Justinas ;
Chen, Kai ;
Liu, Feng .
PROCEEDINGS OF THE 2018 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION (SIGCOMM '18), 2018, :191-205