FLAIR: A Fast and Low-Redundancy Failure Recovery Framework for Inter Data Center Network

被引:1
|
作者
Zhang, Yuchao [1 ]
Huang, Haoqiang [1 ]
Abdelmoniem, Ahmed M. [2 ]
Zeng, Gaoxiong [3 ]
Zheng, Chenyue [1 ]
Que, Xirong [1 ]
Wang, Wendong [1 ]
Xu, Ke [4 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
[2] Queen Mary Univ London, London E14NS, England
[3] Huawei Technol, Shenzhen 518129, Peoples R China
[4] Tsinghua Univ, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Routing; Redundancy; Processor scheduling; Optimization; Delays; Network topology; Data centers; Inter data center network; failure recovery; routing optimization; ROUTING OPTIMIZATION;
D O I
10.1109/TCC.2024.3393735
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the fast developments of 5G and IoT technologies, Inter-Datacenter (Inter-DC) networks are facing unprecedented pressure to duplicate large volumes of geographically distributed user data in a real-time manner. Meanwhile, with the expansion of Inter-DC networks scale, link/node failures also become increasingly frequent, negatively affecting the data transmission efficiency. Therefore, link failure recovery methods become of utmost importance. Many works investigated fast failure recovery, yet none of them consider the deployment overhead of such recovery schemes. While in this article, we found that the side-effect of deploying recovery strategies and the future availability of the recovered transmissions are also crucial for fast recovery. So we propose a fast and low-redundancy failure recovery framework, FLAIR, which consists of a fast recovery strategy FRAVaR and a redundancy removal algorithm ROSE. FRAVaR takes full consideration of deployment overhead by minimizing shuffle traffic. On its base, ROSE regularly eliminates the cumulative rerouting redundancy by removing unnecessary routing updates. The experiment results on 4 realistic network topologies show that FLAIR successfully reduces up to 48.2% deployment overhead compared with the state-of-the-art solutions, and thus reduces up to 70.2% recovery speed and improves up to 36% network utilization.
引用
收藏
页码:737 / 749
页数:13
相关论文
共 22 条
  • [1] FRAVaR: A Fast Failure Recovery Framework for Inter-DC Network
    Huang, Haoqiang
    Zhang, Yuchao
    Wang, Ran
    Xiang, Qiao
    Wang, Wendong
    Que, Xirong
    Xu, Ke
    2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,
  • [2] REVERT: A Network Failure Recovery Method for Data Center Networks
    Cui, Yunhe
    Qian, Qing
    Shen, Guowei
    Guo, Chun
    Li, Saifei
    ELECTRONICS, 2020, 9 (08) : 1 - 20
  • [3] Inter-Data Center Network Dimensioning under Time-of-Use Pricing
    Kantarci, Burak
    Mouftah, Hussein T.
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2016, 4 (04) : 402 - 414
  • [4] Fast Configuration Change Impact Analysis for Network Overlay Data Center Networks
    You, Lizhao
    Zhang, Jiahua
    Jin, Yili
    Tang, Hao
    Li, Xiao
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (01) : 423 - 436
  • [5] vFFR: A Very Fast Failure Recovery Strategy Implemented in Devices With Programmable Data Plane
    Franco, David
    Higuero, Marivi
    Sanz, Ane
    Unzilla, Juanjo
    Huarte, Maider
    IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2024, 5 : 7121 - 7146
  • [6] Global path and bandwidth scheduling in inter-data-center IP/optical transport network
    Zhao, Yang
    Wang, Lei
    Chen, Xue
    Yang, Futao
    Shi, Sheping
    Wang, Huitao
    OPTICAL FIBER TECHNOLOGY, 2016, 30 : 125 - 133
  • [7] SiaDFP: A Disk Failure Prediction Framework Based on Siamese Neural Network in Large-Scale Data Center
    Fang, Xiaoyu
    Guan, Wenbai
    Li, Jiawen
    Cao, Chenhan
    Xia, Bin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (05) : 2890 - 2903
  • [8] A Fast Q-Learning Based Data Storage Optimization for Low Latency in Data Center Networks
    Liao, Zhuofan
    Peng, Jingsheng
    Chen, Yuantao
    Zhang, Jingyu
    Wang, Jin
    IEEE ACCESS, 2020, 8 : 90630 - 90639
  • [9] F2Tree: Rapid Failure Recovery for Routing in Production Data Center Networks
    Chen, Guo
    Zhao, Youjian
    Xu, Hailiang
    Pei, Dan
    Li, Dan
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (04) : 1940 - 1953
  • [10] An adaptive failure recovery mechanism based on asymmetric routing for data center networks
    Liu, Yong
    Gu, Huaxi
    Wang, Kun
    Yu, Xiaoshan
    Wang, Yunhao
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02) : 2103 - 2123