Scalable Methods and Algorithms

被引:3
作者
Astsatryan, Hrachya [1 ]
Kocharyan, Aram [2 ]
Hagimont, Daniel [2 ]
Lalayan, Arthur [3 ]
机构
[1] Natl Acad Sci Republ Armenia, Inst Informat & Automat Problems, Yerevan 0014, Armenia
[2] Univ Fed Toulouse Midi Pyrenees, Toulouse 7, France
[3] Natl Polytech Univ Armenia, Yerevan 0009, Armenia
关键词
Hadoop; Spark; data compression; CPU/IO tradeoff; performance optimization; MANAGEMENT;
D O I
10.2478/cait-2020-0056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 27 条
[1]  
[Anonymous], 2013, P C EXTR SCI ENG DIS
[2]  
[Anonymous], DEC, DOI DOI 10.1016/J.CIRESP.2018.05.006
[3]  
[Anonymous], 2013, PROFESSIONAL HADOOP
[4]   Energy optimization methodology for e-infrastructure providers [J].
Astsatryan, Hrachya ;
Narsisian, Wahi ;
Kocharyan, Aram ;
Da Costa, Georges ;
Hankel, Albert ;
Oleksiak, Ariel .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (10)
[5]  
Astsatryan H, 2015, ROEDUNET IEEE, P28, DOI 10.1109/RoEduNet.2015.7311823
[6]   Big data challenge: a data management perspective [J].
Chen, Jinchuan ;
Chen, Yueguo ;
Du, Xiaoyong ;
Li, Cuiping ;
Lu, Jiaheng ;
Zhao, Suyun ;
Zhou, Xuan .
FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (02) :157-164
[7]  
Chen Y., 2010, GREEN NETWORKING, P23, DOI [10.1145/1851290.1851296, DOI 10.1145/1851290.1851296]
[8]   Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters [J].
Cheng, Dazhao ;
Zhou, Xiaobo ;
Lama, Palden ;
Ji, Mike ;
Jiang, Changjun .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (01) :70-82
[9]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[10]   The Burrows-Wheeler transform for block sorting text compression: Principles and improvements [J].
Fenwick, PM .
COMPUTER JOURNAL, 1996, 39 (09) :731-740