PERFORMANCE-EFFICIENT RECOMMENDATION AND PREDICTION SERVICE FOR BIG DATA FRAMEWORKS FOCUSING ON DATA COMPRESSION AND IN-MEMORY DATA STORAGE INDICATORS

被引:3
作者
Astsatryan, Hrachya [1 ]
Lalayan, Arthur [2 ]
Kocharyan, Aram [3 ]
Hagimont, Daniel [3 ]
机构
[1] Natl Acad Sci Armenia, Inst Informat & Automat Problems, 1 Paruyr Sevak Str, Yerevan 0014, Armenia
[2] Natl Polytech Univ Armenia, 105,Teryan Str, Yerevan 0009, Armenia
[3] Univ Fed Toulouse Midi Pyrenees Toulouse, F-31000 Toulouse 7, France
来源
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE | 2021年 / 22卷 / 04期
关键词
Hadoop; Spark; MapReduce; data compression; in-memory file system;
D O I
10.12694/scpe.v22i4.1945
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data -intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.
引用
收藏
页码:401 / 412
页数:12
相关论文
共 25 条
[1]  
Al-Laham M, 2007, INT J COMPUT SCI NET, V7, P281
[2]   Scalable Methods and Algorithms [J].
Astsatryan, Hrachya ;
Kocharyan, Aram ;
Hagimont, Daniel ;
Lalayan, Arthur .
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2020, 20 (06) :5-17
[3]  
Astsatryan H, 2015, ROEDUNET IEEE, P28, DOI 10.1109/RoEduNet.2015.7311823
[4]   MapReduce: A Flexible Data Processing Tool [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2010, 53 (01) :72-77
[5]   Estimation of prediction error by using K-fold cross-validation [J].
Fushiki, Tadayoshi .
STATISTICS AND COMPUTING, 2011, 21 (02) :137-146
[6]  
Islam NS, 2012, P INT C HIGH PERF CO, P1, DOI DOI 10.1109/SC.2012.65
[7]   A Remote Memory Sharing System for Virtualized Computing Infrastructures [J].
Kocharyan, Aram ;
Ekane, Brice ;
Teabe, Boris ;
Tran, Giang Son ;
Astsatryan, Hrachya ;
Hagimont, Daniel .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (03) :1532-1542
[8]   Intra-node Cooperative Memory Management System for Virtualized Environments [J].
Kocharyan, Aram ;
Teabe, Boris ;
Nitu, Vlad ;
Tchana, Alain ;
Hagimont, Daniel ;
Astsatryan, Hrachya ;
Kocharyan, Hayk .
2018 IVANNIKOV MEMORIAL WORKSHOP (IVMEM 2018), 2018, :56-60
[9]   Genetic K-means algorithm [J].
Krishna, K ;
Murty, MN .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1999, 29 (03) :433-439
[10]   High-Performance Design of Hadoop RPC with RDMA over InfiniBand [J].
Lu, Xiaoyi ;
Islam, Nusrat S. ;
Wasi-ur-Rahman, Md ;
Jose, Jithin ;
Subramoni, Hari ;
Wang, Hao ;
Panda, Dhabaleswar K. .
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, :641-650