Distributed dynamic fault-tolerant routing in fat tree

被引:0
|
作者
Hu N.-D. [1 ,2 ,3 ]
Wang D.-W. [1 ,2 ]
Sun N.-H. [1 ,2 ]
机构
[1] Institute of Computing Technology, Chinese Acad. of Sci.
[2] High Performance Computer Research Center, Chinese Acad. of Sci.
[3] Graduate University of Chinese Acad. of Sci.
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2010年 / 33卷 / 10期
关键词
Distributed routing; Dynamic fault; Fat tree; Fault tolerance; Link fault message;
D O I
10.3724/SP.J.1016.2010.01799
中图分类号
学科分类号
摘要
Fault tolerance of the interconnection network becomes increasingly important, since Cloud Computing is now pushing the data center to adopt the very large scale interconnection network to connect up to tens of thousands of server nodes. In order to maintain high availability and high performance of the interconnection network, this paper proposes a fat-tree based distributed and dynamic fault-tolerant routing methodology. The methodology adopts a link fault message spreading mechanism and a dynamic fault-tolerant routing algorithm to achieve fault tolerance of the fat-tree network. Compared with previous proposals, it neither requires additional network hardware nor increases the length of routing paths. The results show that, in a m-port n-tree topology, the methodology is able to completely tolerate all the combinations of m/2-1 fault links. Moreover, it can tolerate the combination of more fault links with a high probability (99.3 percent of probability to tolerate the combination of ten fault links in an 8-port 3-tree fat tree). Meanwhile, it maintains the good performance of fault-tolerant network.
引用
收藏
页码:1799 / 1808
页数:9
相关论文
共 15 条
  • [1] Greenberg A., Lahiri P., Maltz D., Et al., Towards a next generation data center architecture: Scalability and commoditization, Proceedings of the ACM SIGCOMM Workshop on Programmable Routers for Extensible Services of Tomorrow, pp. 57-62, (2008)
  • [2] Al-Fares M., Loukissas A., Vahdat A., A scalable, commodity data center network architecture, Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, pp. 63-74, (2008)
  • [3] Scott S.L., Thorson G.M., The Cray T3E network: Adaptive routing in a high performance 3D torus, Proceedings of the IEEE Symposium on High Performance Interconnects IV, (1996)
  • [4] Kamiura N., Kodera T., Matsui N., Design of a fault-tolerant multistage interconnection network with parallel duplicated switches, Proceedings of the 15th IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems, (2000)
  • [5] Konstantinidou S., The selective extra stage butterfly, IEEE Transactions on Very Large Scale Integration Systems, 1, 2, pp. 167-171, (1993)
  • [6] Chalsani S., Raghavendra C., Varma A., Fault-tolerant routing in MIN-based supercomputers, Proceedings of the 1990 ACM/IEEE conference on Supercomputing, pp. 244-253, (1990)
  • [7] Lee T.H., Chou J.J., Some directed graph theorems for testing the dynamic full access property of multistage interconnection networks, Proceedings of the IEEE Region 10 Conference on Computer, Communication, Control and Power Engineering, 1, pp. 217-220, (1993)
  • [8] Sharma N.K., Fault-tolerance of a MIN using hybrid redundancy, Proceedings of the 27th Annual Simulation Symposium, pp. 142-149, (1994)
  • [9] Mun Y., Youn H.Y., On performance evaluation of fault-tolerant multistage interconnection networks, Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing, pp. 1-10, (1992)
  • [10] Sengupta J., Bansal P., Fault-tolerant routing in irregular MINs, Proceedings of the IEEE Region 10 International Conference on Global Connectivity in Energy, Computer, Communication and Control, 2, pp. 638-641, (1998)