Improving sparse data movement performance using multiple paths on the Blue Gene/Q supercomputer

被引：4

作者：

Bui, Huy ^{[1
]}

Jung, Eun-Sung ^{[2
]}

Vishwanath, Venkatram ^{[2
]}

Johnson, Andrew ^{[1
]}

Leigh, Jason ^{[4
]}

Papka, Michael E. ^{[3
,5
]}

机构：

[1] Univ Illinois, Elect Visualizat Lab, 842 Taylor St, Chicago, IL 60607 USA

[2] Argonne Natl Lab, Math & Comp Sci, 9700 S Cass Ave, Argonne, IL 60439 USA

[3] Argonne Natl Lab, Argonne Leadership Comp Facil, 9700 S Cass Ave, Argonne, IL 60439 USA

[4] Univ Hawaii, LAVA, 1680 East West Rd, Honolulu, HI 96822 USA

[5] No Illinois Univ, 300 Normal Rd, De Kalb, IL 60115 USA

来源：

PARALLEL COMPUTING | 2016年 / 51卷

基金：

美国国家科学基金会;

关键词：

Multiple paths; Sparse data movement; Topology-aware aggregation; Data-intensive; Blue Gene/Q;

D O I：

10.1016/j.parco.2015.09.002

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In situ analysis has been proposed as a promising solution to glean faster insights and reduce the amount of data to storage. A critical challenge here is that the reduced dataset is typically located on a subset of the nodes and needs to be written out to storage. Data coupling in multiphysics codes also exhibits a sparse data movement pattern wherein data movement occurs among a subset of nodes. We evaluate the performance of data movement for sparse data patterns on the IBM Blue Gene/Q supercomputing system "Mira" and identify performance bottlenecks. We propose a multipath data movement algorithm for sparse data patterns based on an adaptation of a maximum flow algorithm together with breadth-first search that fully exploits all the underlying data paths and I/O nodes to improve data movement. We demonstrate the efficacy of our solutions through a set of microbenchmarks and application benchmarks on Mira scaling up to 131,072 compute cores. The results show that our approach achieves up to 5 x improvement in achievable throughput compared with the default mechanisms. (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：3 / 16

页数：14

共 20 条

[1]

Ali N., 2009, P CLUSTER, P1

[2]

[Anonymous], P IEEE INT C CLUST C

[3]

[Anonymous], P INT C HIGH PERF CO

[4]

[Anonymous], P INT C HIGH PERF CO

[5]

[Anonymous], 2012, SC 12 P INT C HIGH P

[6]

[Anonymous], 1981, P 13 ANN ACM S THEOR

[7]

[Anonymous], 2014, P TWENTYFIFTH ANN AC

[8] Scalable parallel I/O on a Blue Gene/Q supercomputer using compression, topology-aware data aggregation, and subfiling [J].

Bui, Huy ;

Leigh, Jason ;

Vishwanath, Venkatram ;

Finkel, Hal ;

Habib, Salman ;

Heitmann, Katrin ;

Papka, Michael ;

Harms, Kevin .

2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, :107-+

[9]

Ford L., 1987, Classic papers in combinatorics, P243

[10] Efficient Routing Mechanisms for Dragonfly Networks [J].

Garcia, Marina ;

Vallejo, Enrique ;

Beivide, Ramon ;

Odriozola, Miguel ;

Valero, Mateo .

2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, :582-592

← 1 2 →