Reliability and Survivability Analysis of Data Center Network Topologies

被引:32
作者
Couto, Rodrigo de Souza [1 ,2 ]
Secci, Stefano [3 ]
Mitre Campista, Miguel Elias [1 ]
Maciel Kosmalski Costa, Luis Henrique [1 ]
机构
[1] Univ Fed Rio de Janeiro, POLI DEL, COPPE PEE GTA, POB 68504, BR-21941972 Rio De Janeiro, RJ, Brazil
[2] Univ Estado Rio de Janeiro, FEN DETEL PEL, BR-20550013 Rio De Janeiro, RJ, Brazil
[3] Univ Paris 06, Sorbonne Univ, UMR 7606, LIP6, F-75005 Paris, France
关键词
Data center networks; Cloud networks; Survivability; Reliability; Robustness; AVAILABILITY; FRAMEWORK; COST;
D O I
10.1007/s10922-015-9354-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The architecture of several data centers have been proposed as alternatives to the conventional three-layer one. Most of them employ commodity equipment for cost reduction. Thus, robustness to failures becomes even more important, because commodity equipment is more failure-prone. Each architecture has a different network topology design with a specific level of redundancy. In this work, we aim at analyzing the benefits of different data center topologies taking the reliability and survivability requirements into account. We consider the topologies of three alternative data center architecture: Fat-tree, BCube, and DCell. Also, we compare these topologies with a conventional three-layer data center topology. Our analysis is independent of specific equipment, traffic patterns, or network protocols, for the sake of generality. We derive closed-form formulas for the Mean Time To Failure of each topology. The results allow us to indicate the best topology for each failure scenario. In particular, we conclude that BCube is more robust to link failures than the other topologies, whereas DCell has the most robust topology when considering switch failures. Additionally, we show that all considered alternative topologies outperform a three-layer topology for both types of failures. We also determine to which extent the robustness of BCube and DCell is influenced by the number of network interfaces per server.
引用
收藏
页码:346 / 392
页数:47
相关论文
共 50 条
[31]   Reliability of Example Data Center Designs Selected by Tier Classification [J].
Arno, Robert ;
Friedl, Addam ;
Gross, Peter ;
Schuerger, Robert .
2010 IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS TECHNICAL CONFERENCE, 2010,
[32]   Some new survivability measures for network analysis and design [J].
Moitra, SD ;
Oki, E ;
Yamanaka, N .
IEICE TRANSACTIONS ON COMMUNICATIONS, 1997, E80B (04) :625-631
[33]   Integrated Reliability Modeling for Data Center Infrastructures: A Case Study [J].
Mueller, Uwe ;
Strunz, Kai .
2012 3RD IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT EUROPE), 2012,
[34]   Survival analysis approach to reliability, survivability and Prognostics and Health Management (PHM) [J].
Ma, Zhanshan ;
Krings, Axel W. .
2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, :3950-3969
[35]   Competing risks analysis of reliability, survivability, and Prognostics and Health Management (PHM) [J].
Ma, Zhanshan ;
Krings, Axel W. .
2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, :3981-4001
[36]   A METHOD ON SURVIVABILITY ANALYSIS OF EQUIPMENT BASED ON DATA MINING [J].
Zhang, Xiangbo ;
Mei, Guojian ;
Xu, Zongchang .
PROCEEDINGS OF THE 38TH INTERNATIONAL CONFERENCE ON COMPUTERS AND INDUSTRIAL ENGINEERING, VOLS 1-3, 2008, :2753-2755
[37]   Tunable QoS-Aware Network Survivability [J].
Yallouz, Jose ;
Orda, Ariel .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (01) :139-149
[38]   Reliability and survivability of vehicular ad hoc networks: An analytical approach [J].
Dharmaraja, S. ;
Vinayak, Resham ;
Trivedi, Kishor S. .
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2016, 153 :28-38
[39]   Survivability analysis of reconfigurable systems [J].
Bai, Li ;
Biswas, Saroj ;
Ortiz, Albert ;
Ferrese, Frank ;
Dalessandro, Don ;
Dong, Qing .
2007 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1-4, 2007, :663-+
[40]   Survivability analysis of networked systems [J].
Wing, JM .
FORMAL TECHNIQUES FOR NETWORKED AND DISTRIBUTED SYSTEMS, 2001, 69 :459-459