FLAIR: A Fast and Low-Redundancy Failure Recovery Framework for Inter Data Center Network

被引：1

作者：

Zhang, Yuchao ^{[1
]}

Huang, Haoqiang ^{[1
]}

Abdelmoniem, Ahmed M. ^{[2
]}

Zeng, Gaoxiong ^{[3
]}

Zheng, Chenyue ^{[1
]}

Que, Xirong ^{[1
]}

Wang, Wendong ^{[1
]}

Xu, Ke ^{[4
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China

[2] Queen Mary Univ London, London E14NS, England

[3] Huawei Technol, Shenzhen 518129, Peoples R China

[4] Tsinghua Univ, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2024年 / 12卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Routing; Redundancy; Processor scheduling; Optimization; Delays; Network topology; Data centers; Inter data center network; failure recovery; routing optimization; ROUTING OPTIMIZATION;

D O I：

10.1109/TCC.2024.3393735

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the fast developments of 5G and IoT technologies, Inter-Datacenter (Inter-DC) networks are facing unprecedented pressure to duplicate large volumes of geographically distributed user data in a real-time manner. Meanwhile, with the expansion of Inter-DC networks scale, link/node failures also become increasingly frequent, negatively affecting the data transmission efficiency. Therefore, link failure recovery methods become of utmost importance. Many works investigated fast failure recovery, yet none of them consider the deployment overhead of such recovery schemes. While in this article, we found that the side-effect of deploying recovery strategies and the future availability of the recovered transmissions are also crucial for fast recovery. So we propose a fast and low-redundancy failure recovery framework, FLAIR, which consists of a fast recovery strategy FRAVaR and a redundancy removal algorithm ROSE. FRAVaR takes full consideration of deployment overhead by minimizing shuffle traffic. On its base, ROSE regularly eliminates the cumulative rerouting redundancy by removing unnecessary routing updates. The experiment results on 4 realistic network topologies show that FLAIR successfully reduces up to 48.2% deployment overhead compared with the state-of-the-art solutions, and thus reduces up to 70.2% recovery speed and improves up to 36% network utilization.

引用

页码：737 / 749

页数：13

共 22 条

[1] FRAVaR: A Fast Failure Recovery Framework for Inter-DC Network
Huang, Haoqiang
Zhang, Yuchao
Wang, Ran
Xiang, Qiao
Wang, Wendong
Que, Xirong
Xu, Ke
2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,
[2] REVERT: A Network Failure Recovery Method for Data Center Networks
Cui, Yunhe
Qian, Qing
Shen, Guowei
Guo, Chun
Li, Saifei
ELECTRONICS, 2020, 9 (08) : 1 - 20
[3] Inter-Data Center Network Dimensioning under Time-of-Use Pricing
Kantarci, Burak
Mouftah, Hussein T.
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2016, 4 (04) : 402 - 414
[4] Fast Configuration Change Impact Analysis for Network Overlay Data Center Networks
You, Lizhao
Zhang, Jiahua
Jin, Yili
Tang, Hao
Li, Xiao
IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (01) : 423 - 436
[5] vFFR: A Very Fast Failure Recovery Strategy Implemented in Devices With Programmable Data Plane
Franco, David
Higuero, Marivi
Sanz, Ane
Unzilla, Juanjo
Huarte, Maider
IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2024, 5 : 7121 - 7146
[6] Global path and bandwidth scheduling in inter-data-center IP/optical transport network
Zhao, Yang
Wang, Lei
Chen, Xue
Yang, Futao
Shi, Sheping
Wang, Huitao
OPTICAL FIBER TECHNOLOGY, 2016, 30 : 125 - 133
[7] SiaDFP: A Disk Failure Prediction Framework Based on Siamese Neural Network in Large-Scale Data Center
Fang, Xiaoyu
Guan, Wenbai
Li, Jiawen
Cao, Chenhan
Xia, Bin
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (05) : 2890 - 2903
[8] A Fast Q-Learning Based Data Storage Optimization for Low Latency in Data Center Networks
Liao, Zhuofan
Peng, Jingsheng
Chen, Yuantao
Zhang, Jingyu
Wang, Jin
IEEE ACCESS, 2020, 8 : 90630 - 90639
[9] F2Tree: Rapid Failure Recovery for Routing in Production Data Center Networks
Chen, Guo
Zhao, Youjian
Xu, Hailiang
Pei, Dan
Li, Dan
IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (04) : 1940 - 1953
[10] An adaptive failure recovery mechanism based on asymmetric routing for data center networks
Liu, Yong
Gu, Huaxi
Wang, Kun
Yu, Xiaoshan
Wang, Yunhao
JOURNAL OF SUPERCOMPUTING, 2021, 77 (02) : 2103 - 2123

← 1 2 3 →