Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance

被引:15
作者
Agarwal, Megha [1 ]
Singhvi, Divyansh [1 ]
Malakar, Preeti [1 ]
Byna, Suren [2 ]
机构
[1] IIT Kanpur, Kanpur, Uttar Pradesh, India
[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA
来源
PROCEEDINGS OF PDSW 2019: 2019 IEEE/ACM FOURTH INTERNATIONAL PARALLEL DATA SYSTEMS WORKSHOP (PDSW) | 2019年
关键词
Parallel I/O; auto-tuning; active learning; performance prediction; machine learning;
D O I
10.1109/PDSW49588.2019.00007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11x over the default parameters. We got up to 3x improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2x higher bandwidth for 4.8 TB of non-contiguous I/O in BT-IO benchmark.
引用
收藏
页码:20 / 29
页数:10
相关论文
共 26 条
  • [1] [Anonymous], P 2008 ACM IEEE C SU
  • [2] Autotuning in High-Performance Computing Applications
    Balaprakash, Prasanna
    Dongarra, Jack
    Gamblin, Todd
    Hall, Mary
    Hollingsworth, Jeffrey K.
    Norris, Boyana
    Vuduc, Richard
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 2068 - 2083
  • [3] Barker K. J., 2009, INT S PAR DISTR PROC
  • [4] Behzad B., 2013, P INT C HIGH PERFORM, P1, DOI DOI 10.1145/2503210.2503278
  • [5] BEHZAD B, 2019, TOPC, V5
  • [6] Behzad Babak., 2014, Proceedings of the 23rd international symposium on High-performance parallel and distributed comput- ing, P253
  • [7] Carns P., 2009, 2009 IEEE INT C CLUS, P1
  • [8] Chen J. H., 2009, Computational Science and Discovery, V2, DOI 10.1088/1749-4699/2/1/015001
  • [9] Chen T., 2016, ABS160302754 CORR
  • [10] Congiu G., 2016, IEEE INT C CLUST COM