Divide-and-conquer approach for solving singular value decomposition based on MapReduce

被引:4
作者
Zhao, Shuoyi [1 ]
Li, Ruixuan [1 ]
Tian, Wenlong [2 ]
Xiao, Weijun [3 ]
Dong, Xinhua [1 ]
Liao, Dongjie [1 ]
Khan, Samee U. [4 ]
Li, Keqin [5 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan 430074, Hubei, Peoples R China
[3] Virginia Commonwealth Univ, Dept Elect & Comp Engn, Richmond, VA 23284 USA
[4] N Dakota State Univ, Dept Elect & Comp Engn, Fargo, ND 58108 USA
[5] State Univ New York New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
distributed computation; divide-and-conquer; MapReduce; singular value decomposition;
D O I
10.1002/cpe.3436
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Singular value decomposition (SVD) shows strong vitality in the area of information analysis and has significant application value in most of the scientific big data fields. However, with the rapid development of Internet, the information online reveals fast growing trend. For a large-scale matrix, applying SVD computation directly is both time consuming and memory demanding. There are many works available to speed up the computation of SVD based on the message passing interface model. However, to deal with large-scale data processing, a MapReduce model has many advantages over a message passing interface model, such as fault tolerance, load balancing and simplicity. For a MapReduce environment, existing approaches only focus on low rank SVD approximation and tall-and-skinny matrix SVD computation, and there are no implementations of full rank SVD computation. In this paper, we propose a MapReduce-based implementation for solving divide-and-conquer SVD algorithm. To achieve high performance, we design a two-stage task scheduling strategy based on the mathematical characteristics of divide-and-conquer SVD algorithm. To further strengthen the performance, we propose a row-index-based divide algorithm, a pipelined task scheduling method, and revised block matrix multiplication in MapReduce framework. Experimental result shows the efficiency of our algorithm. Our implementation can accommodate full rank SVD computation of large-scale matrix very efficiently. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:331 / 350
页数:20
相关论文
共 24 条
[1]  
Anderson M., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P48, DOI 10.1109/IPDPS.2011.15
[2]  
Bayramli B., 2013, ARXIV13104664
[3]  
Benson Austin R., 2013, 2013 IEEE International Conference on Big Data, P264, DOI 10.1109/BigData.2013.6691583
[4]  
Chung J, 2011, HDB MATH METHODS IMA, P43, DOI [DOI 10.1007/978-0-387-92920-02, 10.1007/978-0-387-92920-0_2, DOI 10.1007/978-0-387-92920-0_2]
[5]  
Constantine PG, 2012, INT CONF ACOUST SPEE, P5333, DOI 10.1109/ICASSP.2012.6289125
[6]  
Constantine Paul G, 2011, 2 INT WORKSH MAPREDU, P43
[7]  
CUPPEN JJM, 1981, NUMER MATH, V36, P177, DOI 10.1007/BF01396757
[8]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[9]  
Drmac Z, 2007, SIAM J MATRIX ANAL A, V29, P1322, DOI 10.1137/050639193
[10]  
Golub GH., 2013, MATRIX COMPUTATIONS