Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance

被引：15

作者：

Agarwal, Megha ^{[1
]}

Singhvi, Divyansh ^{[1
]}

Malakar, Preeti ^{[1
]}

Byna, Suren ^{[2
]}

机构：

[1] IIT Kanpur, Kanpur, Uttar Pradesh, India

[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA

来源：

PROCEEDINGS OF PDSW 2019: 2019 IEEE/ACM FOURTH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP (PDSW) | 2019年

关键词：

Parallel I/O; auto-tuning; active learning; performance prediction; machine learning;

D O I：

10.1109/PDSW49588.2019.00007

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11x over the default parameters. We got up to 3x improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2x higher bandwidth for 4.8 TB of non-contiguous I/O in BT-IO benchmark.

引用

页码：20 / 29

页数：10

共 26 条

[1] [Anonymous], P 2008 ACM IEEE C SU
[2] Autotuning in High-Performance Computing Applications
Balaprakash, Prasanna
Dongarra, Jack
Gamblin, Todd
Hall, Mary
Hollingsworth, Jeffrey K.
Norris, Boyana
Vuduc, Richard
[J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 2068 - 2083
[3] Barker K. J., 2009, INT S PAR DISTR PROC
[4] Behzad B., 2013, P INT C HIGH PERFORM, P1, DOI DOI 10.1145/2503210.2503278
[5] BEHZAD B, 2019, TOPC, V5
[6] Behzad Babak., 2014, Proceedings of the 23rd international symposium on High-performance parallel and distributed comput- ing, P253
[7] Carns P., 2009, 2009 IEEE INT C CLUS, P1
[8] Chen J. H., 2009, Computational Science and Discovery, V2, DOI 10.1088/1749-4699/2/1/015001
[9] Chen T., 2016, ABS160302754 CORR
[10] Congiu G., 2016, IEEE INT C CLUST COM

← 1 2 3 →