Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

被引：2

作者：

Cheng, Peng ^{[1
,2
]}

Lu, Yutong ^{[3
]}

Du, Yunfei ^{[3
]}

Chen, Zhiguang ^{[1
,2
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China

[2] State Key Lab High Performance Comp, Changsha, Peoples R China

[3] Natl Supercomp Ctr Guangzhou NSCC GZ, Guangzhou, Peoples R China

来源：

SUPERCOMPUTING FRONTIERS, SCFA 2018 | 2018年 / 10776卷

基金：

国家重点研发计划;

关键词：

High performance computing; Big data; Convergence; File system; Hadoop;

D O I：

10.1007/978-3-319-69953-0_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.

引用

页码：90 / 106

页数：17

共 27 条

[1] [Anonymous], 2010, login
[2] [Anonymous], ACCESS IEEE
[3] [Anonymous], 2017, TITAN CRAY XK7
[4] Bhandarkar M., 2010, IEEE INT S PARALLEL, P1, DOI DOI 10.1109/IPDPS.2010.5470377
[5] Brohi SN, 2016, J ENG SCI TECHNOL, V11, P1793
[6] Chaimov Nicholas., 2016, Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, P97, DOI DOI 10.1145/2907294.2907310
[7] Donovan S., 2003, P LIN S, P9
[8] Fadika Z., 2011, GRID, P82
[9] Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates
French, Scott
Zheng, Yili
Romanowicz, Barbara
Yelick, Katherine
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 753 - 762
[10] The Sunway TaihuLight supercomputer: system and applications
Fu, Haohuan
Liao, Junfeng
Yang, Jinzhe
Wang, Lanning
Song, Zhenya
Huang, Xiaomeng
Yang, Chao
Xue, Wei
Liu, Fangfang
Qiao, Fangli
Zhao, Wei
Yin, Xunqiang
Hou, Chaofeng
Zhang, Chenglong
Ge, Wei
Zhang, Jian
Wang, Yangang
Zhou, Chunbo
Yang, Guangwen
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (07)

← 1 2 3 →