Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

被引:2
作者
Cheng, Peng [1 ,2 ]
Lu, Yutong [3 ]
Du, Yunfei [3 ]
Chen, Zhiguang [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] State Key Lab High Performance Comp, Changsha, Peoples R China
[3] Natl Supercomp Ctr Guangzhou NSCC GZ, Guangzhou, Peoples R China
来源
SUPERCOMPUTING FRONTIERS, SCFA 2018 | 2018年 / 10776卷
基金
国家重点研发计划;
关键词
High performance computing; Big data; Convergence; File system; Hadoop;
D O I
10.1007/978-3-319-69953-0_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.
引用
收藏
页码:90 / 106
页数:17
相关论文
共 27 条
  • [1] [Anonymous], 2010, login
  • [2] [Anonymous], ACCESS IEEE
  • [3] [Anonymous], 2017, TITAN CRAY XK7
  • [4] Bhandarkar M., 2010, IEEE INT S PARALLEL, P1, DOI DOI 10.1109/IPDPS.2010.5470377
  • [5] Brohi SN, 2016, J ENG SCI TECHNOL, V11, P1793
  • [6] Chaimov Nicholas., 2016, Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, P97, DOI DOI 10.1145/2907294.2907310
  • [7] Donovan S., 2003, P LIN S, P9
  • [8] Fadika Z., 2011, GRID, P82
  • [9] Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates
    French, Scott
    Zheng, Yili
    Romanowicz, Barbara
    Yelick, Katherine
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 753 - 762
  • [10] The Sunway TaihuLight supercomputer: system and applications
    Fu, Haohuan
    Liao, Junfeng
    Yang, Jinzhe
    Wang, Lanning
    Song, Zhenya
    Huang, Xiaomeng
    Yang, Chao
    Xue, Wei
    Liu, Fangfang
    Qiao, Fangli
    Zhao, Wei
    Yin, Xunqiang
    Hou, Chaofeng
    Zhang, Chenglong
    Ge, Wei
    Zhang, Jian
    Wang, Yangang
    Zhou, Chunbo
    Yang, Guangwen
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (07)