Scalable and fault tolerant orthogonalization based on randomized distributed data aggregation

被引:19
作者
Gansterer, Wilfried N. [1 ]
Niederbrucker, Gerhard [1 ]
Strakova, Hana [1 ]
Grotthoff, Stefan Schulze [1 ]
机构
[1] Univ Vienna, Res Grp Theory & Applicat Algorithms, A-1090 Vienna, Austria
基金
奥地利科学基金会;
关键词
Distributed reduction operation; Push-flow algorithm; Distributed orthogonalization; Distributed matrix computations; Fault tolerant matrix computations; GOSSIP; ALGORITHM;
D O I
10.1016/j.jocs.2013.01.006
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The construction of distributed algorithms for matrix computations built on top of distributed data aggregation algorithms with randomized communication schedules is investigated. For this purpose, a new aggregation algorithm for summing or averaging distributed values, the push-flow algorithm, is developed, which achieves superior resilience properties with respect to failures compared to existing aggregation methods. It is illustrated that on a hypercube topology it asymptotically requires the same number of iterations as the optimal all-to-all reduction operation and that it scales well with the number of nodes. Orthogonalization is studied as a prototypical matrix computation task. A new fault tolerant distributed orthogonalization method rdmGS, which can produce accurate results even in the presence of node failures, is built on top of distributed data aggregation algorithms. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:480 / 488
页数:9
相关论文
共 32 条
[21]  
Kermarrec Anne-Marie, 2007, Operating Systems Review, V41, P2, DOI 10.1145/1317379.1317381
[22]  
Mosk-Aoyama D., 2006, P 25 ANN ACM S PRINC, P113, DOI DOI 10.1145/1146381.1146401
[23]   CONVERGENCE SPEED IN DISTRIBUTED CONSENSUS AND AVERAGING [J].
Olshevsky, Alex ;
Tsitsiklis, John N. .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2009, 48 (01) :33-55
[24]   Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques [J].
Plank, JS .
15TH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1996, :76-85
[25]   Diskless checkpointing [J].
Plank, JS ;
Li, K ;
Puening, MA .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1998, 9 (10) :972-986
[26]   Network robustness and irreversibility of information diffusion in Complex networks [J].
Safar, Maytham ;
Mahdi, Khaled ;
Torabi, Sadeq .
JOURNAL OF COMPUTATIONAL SCIENCE, 2011, 2 (03) :198-206
[27]   Bloom: A stochastic growth-based fast method of community detection in networks [J].
Schumm, Phillip ;
Scoglio, Caterina .
JOURNAL OF COMPUTATIONAL SCIENCE, 2012, 3 (05) :356-366
[28]   Gossip Algorithms [J].
Shah, Devavrat .
FOUNDATIONS AND TRENDS IN NETWORKING, 2008, 3 (01) :1-125
[29]  
Strakova H., 2013, P 21 EUR INT C PAR D
[30]  
Straková H, 2012, LECT NOTES COMPUT SC, V7203, P235, DOI 10.1007/978-3-642-31464-3_24