I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning

被引:50
作者
Chowdhury, Fahim [1 ]
Zhu, Yue [1 ]
Heer, Todd [2 ]
Paredes, Saul [1 ]
Moody, Adam [2 ]
Goldstone, Robin [2 ]
Mohror, Kathryn [2 ]
Yu, Weikuan [1 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA USA
来源
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019) | 2019年
基金
美国国家科学基金会;
关键词
D O I
10.1145/3337821.3337902
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel File Systems (PFSs) are frequently deployed on leadership High Performance Computing (HPC) systems to ensure efficient I/O, persistent storage and scalable performance. Emerging Deep Learning (DL) applications incur new I/O and storage requirements to HPC systems with batched input of small random files. This mandates PFSs to have commensurate features that can meet the needs of DL applications. BeeGFS is a recently emerging PFS that has grabbed the attention of the research and industry world because of its performance, scalability and ease of use. While emphasizing a systematic performance analysis of BeeGFS, in this paper, we present the architectural and system features of BeeGFS, and perform an experimental evaluation using cutting-edge I/O, Metadata and DL application benchmarks. Particularly, we have utilized AlexNet and ResNet-50 models for the classification of ImageNet dataset using the Livermore Big Artificial Neural Network Toolkit (LBANN), and ImageNet data reader pipeline atop TensorFlow and Horovod. Through extensive performance characterization of BeeGFS, our study provides a useful documentation on how to leverage BeeGFS for the emerging DL applications.
引用
收藏
页数:10
相关论文
共 35 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Ali N, 2009, 2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, P86
[3]  
[Anonymous], 2018, P INT C HIGH PERF CO
[4]  
[Anonymous], PROC CVPR IEEE
[5]  
[Anonymous], 2000, LINUX J
[6]  
[Anonymous], HICSS
[7]  
[Anonymous], 2015, INT S HIGH PERFORMAN
[8]  
[Anonymous], P 2008 ACM IEEE C SU
[9]  
[Anonymous], 2015, P WORKSHOP MACHINE L
[10]  
[Anonymous], 2017, ARXIV170703750