IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop

被引:4
作者
Kavitha, C. [1 ]
Srividhya, S. R. [1 ]
Lai, Wen-Cheng [2 ,3 ]
Mani, Vinodhini [1 ]
机构
[1] Sathyabama Inst Sci & Technol, Dept Comp Sci & Engn, Chennai 600119, Tamil Nadu, India
[2] Natl Yunlin Univ Sci & Technol, Bachelor Program Ind Projects, Touliu 640301, Yunlin, Taiwan
[3] Natl Yunlin Univ Sci & Technol, Dept Elect Engn, Touliu 640301, Yunlin, Taiwan
关键词
big data; combiner; distributed storage; hadoop; mapreduce; sort; task failure resilience; wordcount;
D O I
10.3390/electronics11101599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is a framework for storing and processing huge amounts of data. With HDFS, large data sets can be managed on commodity hardware. MapReduce is a programming model for processing vast amounts of data in parallel. Mapping and reducing can be performed by using the MapReduce programming framework. A very large amount of data is transferred from Mapper to Reducer without any filtering or recursion, resulting in overdrawn bandwidth. In this paper, we introduce an algorithm called Inner MAPping Combiner (IMapC) for the map phase. This algorithm in the Mapper combines the values of recurring keys. In order to test the efficiency of the algorithm, different approaches were tested. According to the test, MapReduce programs that are implemented with the Default Combiner (DC) of IMapC will be 70% more efficient than those that are implemented without one. To make computations significantly faster, this work can be combined with MapReduce.
引用
收藏
页数:16
相关论文
共 23 条
  • [1] Multi-objective cluster head selection using fitness averaged rider optimization algorithm for IoT networks in smart cities
    Alazab, Mamoun
    Lakshmanna, Kuruva
    Reddy, G. Thippa
    Pham, Quoc-Viet
    Maddikunta, Praveen Kumar Reddy
    [J]. SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2021, 43
  • [2] [Anonymous], 2012, NSDI
  • [3] Crume A., 2013, IEEE SC COMPANION HI, P1
  • [4] Energy Optimization for Green Communication in IoT Using Harris Hawks Optimization
    Dev, Kapal
    Maddikunta, Praveen Kumar Reddy
    Gadekallu, Thippa Reddy
    Bhattacharya, Sweta
    Hegde, Pawan
    Singh, Saurabh
    [J]. IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2022, 6 (02): : 685 - 694
  • [5] iShuffle: Improving Hadoop Performance with Shuffle-on-Write
    Guo, Yanfei
    Rao, Jia
    Cheng, Dazhao
    Zhou, Xiaobo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1649 - 1662
  • [6] Multi-Level Per Node Combiner (MLPNC) to Minimize MapReduce Job Latency on Virtualized Environment
    Jeyaraj, Rathinaraja
    Ananthanarayana, V. S.
    [J]. 33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 167 - 174
  • [7] Early-Stage Alzheimer's Disease Prediction Using Machine Learning Models
    Kavitha, C.
    Mani, Vinodhini
    Srividhya, S. R.
    Khalaf, Osamah Ibrahim
    Tavera Romero, Carlos Andres
    [J]. FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [8] Kavitha C, 2021, J ENG SCI TECHNOL, V16, P4864
  • [9] Task failure resilience technique for improving the performance of MapReduce in Hadoop
    Kavitha, C.
    Anita, X.
    [J]. ETRI JOURNAL, 2020, 42 (05) : 751 - 763
  • [10] Kavitha C., 2019, International Journal of Reasoning-based Intelligent Systems, V11, P181