Replica-aware data recovery performance improvement for Hadoop system with NVM

被引:0
|
作者
Li, Xin [1 ]
Li, Huijie [1 ]
Lu, Youyou [2 ]
Zhao, Yanchao [1 ]
Qin, Xiaolin [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Data recovery; HDFS; MapReduce; Non-volatile memory; Performance tuning; CLUSTER; MEMORY;
D O I
10.1007/s42514-021-00066-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The non-volatile memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing hard disk drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery issue in Hadoop file system (HDFS) as a case study to investigate how to take advantage of the performance of NVM. We analyze the data recovery mechanism in HDFS and find that the configuration of replication tasks in the DataNode can affect the data recovery significantly. We conduct extensive analysis and experiments tuning the configuration and also get some interesting findings. With the new configuration, we increase the data recovery performance from 17 to 71%. We can also improve the execution performance of MapReduce jobs from 28 to 59% through optimized configuration. We also find that the sudden data recovery brings disordered network resource competition, which reduces the performance of MapReduce jobs. Hence, We present a priority-aware multi-stage data recovery method. This improves the performance by 32.5% in addition for the MapReduce jobs.
引用
收藏
页码:144 / 156
页数:13
相关论文
共 50 条
  • [31] Performance analysis of a novel heat recovery system with hydrogen production designed for the improvement of boiler effectiveness
    Arslan, Oguz
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2021, 46 (10) : 7558 - 7572
  • [32] Modified hierarchical strategy for transient performance improvement of the ORC based waste heat recovery system
    Shi, Yao
    Zhang, Zhiming
    Xie, Lei
    Wu, Xialai
    Liu, Xueqin Amy
    Lu, Shan
    Su, Hongye
    ENERGY, 2022, 261
  • [33] Striping Layout Aware Data Aggregation for High Performance I/O on a Lustre File System
    Tsujita, Yuichi
    Hori, Atsushi
    Ishikawa, Yutaka
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 : 282 - 290
  • [34] Boosting Parallel File System Performance via Heterogeneity-Aware Selective Data Layout
    He, Shuibing
    Wang, Yang
    Sun, Xian-He
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (09) : 2492 - 2505
  • [35] Enhancing hybrid parallel file system through performance and space-aware data layout
    He, Shuibing
    Liu, Yan
    Wang, Yang
    Sun, Xian-He
    Huang, Chuanhe
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (04): : 396 - 410
  • [36] Design and performance evaluation of ContentPlace, a social-aware data dissemination system for opportunistic networks
    Boldrini, Chiara
    Conti, Marco
    Passarella, Andrea
    COMPUTER NETWORKS, 2010, 54 (04) : 589 - 604
  • [37] Work-in-Progress: A PV Aware Data Placement Scheme for Read Performance Improvement on LDPC based Flash Memory
    Li, Qiao
    Shi, Liang
    Di, Yejia
    Du, Yajuan
    Wu, Kaijie
    Xue, Chun Jason
    Zhuge, Qingfeng
    Sha, Edwin H-M.
    2017 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2017,
  • [38] Improvement of the Classifier Performance of a Pedestrian Detection System by Pixel-Based Data Fusion
    Lietz, Holger
    Thomanek, Jan
    Fardi, Basel
    Wanielik, Gerd
    AI (ASTERISK) IA 2009: EMERGENT PERSPECTIVES IN ARTIFICIAL INTELLIGENCE, 2009, 5883 : 122 - 130
  • [39] The Improvement for Performance of Inter-VTS Data Exchange Format Protocol in VTS System
    Park, Namje
    FRONTIER AND INNOVATION IN FUTURE COMPUTING AND COMMUNICATIONS, 2014, 301 : 545 - 551
  • [40] Performance improvement of a frequency hopping-CDMA system utilizing memorized prior data
    Kim, Seyoung
    Metzner, John J.
    IEEE Transactions on Communications, 1991, 39 (04): : 496 - 502