Research on MapReduce Based Incremental Iterative Model and Framework

被引:0
作者
Song, Jie [1 ]
Guo, Chaopeng [1 ]
Zhang, Yichuan [1 ]
Zhu, Zhiliang [1 ]
Yu, Ge [2 ]
机构
[1] Northeastern Univ, Software Coll, Shenyang 110819, Peoples R China
[2] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110819, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
Big data; Incremental iterative; Iterative computing; MapReduce;
D O I
10.1080/03772063.2014.987703
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the big data environment, MapReduce could be adopted to improve the efficiency of iterative algorithm on massive data through running the iterative algorithm on larger PC-cluster. However, it is inefficient if the entire data has to be re-iterated when new data is introduced. In this paper, the incremental iterative computing model (I2M) based on the incremental data and original iterative results is proposed. Then, the MapReduce and I2M based descendant query, PageRank, and K-means, are enumerated. Finally, incremental iterative computing framework (I2F) is implemented by extending HaLoop to support incremental iterative computing. A series of test cases are designed to evaluate I2F on functionality, performance, and cost of incremental iteration. The incremental iterative model proposed in this paper can adapt many iterative algorithms, and promotes the application and optimization of iterative algorithm in the big data environment.
引用
收藏
页码:32 / 40
页数:9
相关论文
共 16 条
[1]  
[Anonymous], 2011, Proceedings of the 2nd ACM Symposium on Cloud Computing
[2]  
[Anonymous], 2010, P 19 ACM INT S HIGH, DOI DOI 10.1145/1851476.1851593
[3]  
[Anonymous], 1998, Empirical Analysis Of Predictive Algorithms For CollaBorative Filtering
[4]  
[Anonymous], 2010, P USENIX S OP SYST D
[5]  
Bhatotia P., 2011, Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11, P1, DOI DOI 10.1145/2038916.2038923
[6]  
Blanas S., 2010, ACM SIGMOD INT C MAN, P975, DOI [DOI 10.1145/1807167.1807273, 10.1145/1807167.1807273]
[7]   A novel transductive SVM for semisupervised classification of remote-sensing images [J].
Bruzzone, Lorenzo ;
Chi, Mingmin ;
Marconcini, Mattia .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2006, 44 (11) :3363-3373
[8]   The HaLoop approach to large-scale iterative data analysis [J].
Bu, Yingyi ;
Howe, Bill ;
Balazinska, Magdalena ;
Ernst, Michael D. .
VLDB JOURNAL, 2012, 21 (02) :169-190
[9]   AN INTERVAL-BASED APPROACH TO EXHAUSTIVE AND INCREMENTAL INTERPROCEDURAL DATA-FLOW ANALYSIS [J].
BURKE, M .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1990, 12 (03) :341-395
[10]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137