Scalable and fault tolerant orthogonalization based on randomized distributed data aggregation

被引:19
作者
Gansterer, Wilfried N. [1 ]
Niederbrucker, Gerhard [1 ]
Strakova, Hana [1 ]
Grotthoff, Stefan Schulze [1 ]
机构
[1] Univ Vienna, Res Grp Theory & Applicat Algorithms, A-1090 Vienna, Austria
基金
奥地利科学基金会;
关键词
Distributed reduction operation; Push-flow algorithm; Distributed orthogonalization; Distributed matrix computations; Fault tolerant matrix computations; GOSSIP; ALGORITHM;
D O I
10.1016/j.jocs.2013.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The construction of distributed algorithms for matrix computations built on top of distributed data aggregation algorithms with randomized communication schedules is investigated. For this purpose, a new aggregation algorithm for summing or averaging distributed values, the push-flow algorithm, is developed, which achieves superior resilience properties with respect to failures compared to existing aggregation methods. It is illustrated that on a hypercube topology it asymptotically requires the same number of iterations as the optimal all-to-all reduction operation and that it scales well with the number of nodes. Orthogonalization is studied as a prototypical matrix computation task. A new fault tolerant distributed orthogonalization method rdmGS, which can produce accurate results even in the presence of node failures, is built on top of distributed data aggregation algorithms. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:480 / 488
页数:9
相关论文
共 32 条
[11]  
Ferreira Kurt, 2011, Tech. Rep. SAND2011-2488
[12]  
Guermouche A, 2011, P 2011 IEEE INT PAR
[13]   Expander graphs and their applications [J].
Hoory, Shlomo ;
Linial, Nathan ;
Wigderson, Avi .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 2006, 43 (04) :439-561
[14]   ALGORITHM-BASED FAULT TOLERANCE FOR MATRIX OPERATIONS [J].
HUANG, KH ;
ABRAHAM, JA .
IEEE TRANSACTIONS ON COMPUTERS, 1984, 33 (06) :518-528
[15]   Gossip-based aggregation in large dynamic networks [J].
Jelasity, M ;
Montresor, A ;
Babaoglu, O .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2005, 23 (03) :219-252
[16]   Fault-Tolerant Aggregation for Dynamic Networks [J].
Jesus, Paulo ;
Baquero, Carlos ;
Almeida, Paulo Sergio .
2010 29TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS SRDS 2010, 2010, :37-43
[17]  
Jesus P, 2009, LECT NOTES COMPUT SC, V5523, P73, DOI 10.1007/978-3-642-02164-0_6
[18]   Randomized rumor spreading [J].
Karp, R ;
Schindelhauer, C ;
Shenker, S ;
Vöcking, B .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :565-574
[19]   Gossip-based computation of aggregate information [J].
Kempe, D ;
Dobra, A ;
Gehrke, J .
44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, :482-491
[20]   A decentralized algorithm for spectral analysis [J].
Kempe, David ;
McSherry, Frank .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2008, 74 (01) :70-83