Energy-based tuning of convolutional neural networks on multi-GPUs

被引:9
作者
Castro, F. M. [1 ]
Guil, N. [1 ]
Marin-Jimenez, M. J. [2 ]
Perez-Serrano, J. [1 ]
Ujaldon, M. [1 ]
机构
[1] Univ Malaga, Comp Architecture Dept, E-29071 Malaga, Spain
[2] Univ Cordoba, Dept Comp & Numer Anal, Cordoba, Spain
关键词
CNN; deep learning; GPU; HPC; low-power; RECOGNITION; GAIT; FEATURES;
D O I
10.1002/cpe.4786
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context, Convolutional Neural Network (CNN) models constitute a representative example of success on a wide set of complex applications, particularly on datasets where the target can be represented through a hierarchy of local features of increasing semantic complexity. In most of the real scenarios, the roadmap to improve results relies on CNN settings involving brute force computation, and researchers have lately proven Nvidia GPUs to be one of the best hardware counterparts for acceleration. Our work complements those findings with an energy study on critical parameters for the deployment of CNNs on flagship image and video applications, ie, object recognition and people identification by gait, respectively. We evaluate energy consumption on four different networks based on the two most popular ones (ResNet/AlexNet), ie, ResNet (167 layers), a 2D CNN (15 layers), a CaffeNet (25 layers), and a ResNetIm (94 layers) using batch sizes of 64, 128, and 256, and then correlate those with speed-up and accuracy to determine optimal settings. Experimental results on a multi-GPU server endowed with twin Maxwell and twin Pascal Titan X GPUs demonstrate that energy correlates with performance and that Pascal may have up to 40% gains versus Maxwell. Larger batch sizes extend performance gains and energy savings, but we have to keep an eye on accuracy, which sometimes shows a preference for small batches. We expect this work to provide a preliminary guidance for a wide set of CNN and DL applications in modern HPC times, where the GFLOPS/w ratio constitutes the primary goal.
引用
收藏
页数:22
相关论文
共 59 条
  • [1] Abadi M., 2015, TENSORFLOW LARGESCAL
  • [2] Abu-El-Haija S., 2016, ARXIV160908675
  • [3] YodaNN1 : An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
    Andri, Renzo
    Cavigelli, Lukas
    Rossi, Davide
    Benini, Luca
    [J]. 2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, : 236 - 241
  • [4] [Anonymous], ADAFRUIT INA219 CURR
  • [5] [Anonymous], 2015, MATCONVNET CONVOLUTI
  • [6] [Anonymous], 2015, P 2015 ACM SIGDA INT
  • [7] [Anonymous], IEEE C COMP VIS PATT
  • [8] [Anonymous], P 13 SCAND C IM AN H
  • [9] [Anonymous], 2014, NVIDIA GeForce GTX 980
  • [10] [Anonymous], 22 ACM SIGKDD INT C