EcoFlow: Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

被引:1
作者
Orosa, Lois [1 ,2 ]
Koppula, Skanda [3 ]
Umuroglu, Yaman [4 ,5 ]
Kanellopoulos, Konstantinos [1 ]
Gomez-Luna, Juan [1 ]
Blott, Michaela
Vissers, Kees
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland
[2] Galicia Supercomp Ctr, Santiago De Compostela 15705, Spain
[3] DeepMind, London EC4A 3TW, England
[4] Michaela Blott, Santa Clara, CA 95054 USA
[5] Kees Vissers, Santa Clara, CA 95054 USA
关键词
Convolutional neural networks; Training; Computer architecture; Arrays; Kernel; Generative adversarial networks; Speech recognition; hardware accelerators;
D O I
10.1109/TC.2023.3272282
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. We find that commonly-used low-power CNN inference accelerators are not optimized for both these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to existing accelerators. At its core, EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. We evaluate EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) workloads. Experiments in SASiML, our new cycle-accurate simulator, show that, using a common CNN inference accelerator, EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN dataflows.
引用
收藏
页码:2275 / 2289
页数:15
相关论文
共 50 条
  • [21] A New Zero-Overhead Test Method for Low-Power AI Accelerators
    Lee, Sangjun
    Park, Jongho
    Park, Sungwhan
    Kim, Hyemin
    Kang, Sungho
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (05) : 2649 - 2653
  • [22] Enabling Timing Error Resilience for Low-Power Systolic-Array Based Deep Learning Accelerators
    Zhang, Jeff
    Ghodsi, Zahra
    Garg, Siddharth
    Rangineni, Kartheek
    [J]. IEEE DESIGN & TEST, 2020, 37 (02) : 93 - 102
  • [23] Enhancement of Convolutional Neural Network Hardware Accelerators Efficiency Using Sparsity Optimization Framework
    Kurapati, Hemalatha
    Ramachandran, Sakthivel
    [J]. IEEE ACCESS, 2024, 12 : 86034 - 86042
  • [24] PRUNIX: Non-Ideality Aware Convolutional Neural Network Pruning for Memristive Accelerators
    Al-Shaarawy, Ali
    Amirsoleimani, Amirali
    Genov, Roman
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1299 - 1303
  • [25] Bin-Specific Quantization in Spectral-Domain Convolutional Neural Network Accelerators
    Park, Jinho
    Lee, Jaewon
    Kim, Gain
    Bae, Hyeon-Min
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 407 - 410
  • [26] DivNet: Efficient Convolutional Neural Network via Multilevel Hierarchical Architecture Design
    Kaddar, Bachir
    Fizazi, Hadria
    Hernandez-Cabronero, Miguel
    Sanchez, Victor
    Serra-Sagrista, Joan
    [J]. IEEE ACCESS, 2021, 9 : 105892 - 105901
  • [27] A Configurable and Versatile Architecture for Low Power, Energy Efficient Hardware Acceleration of Convolutional Neural Networks
    Christensen, Steinar Thune
    Aunet, Snorre
    Qadir, Omer
    [J]. 2019 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS) - NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2019,
  • [28] Light Convolutional Neural Network for Digital Predistortion of Radio Frequency Power Amplifiers
    Xie, Qian
    Wang, Yong
    Ding, Jianyang
    Niu, Jiajun
    [J]. IEEE COMMUNICATIONS LETTERS, 2024, 28 (10) : 2377 - 2381
  • [29] NLCMAP: A FRAMEWORK FOR THE EFFICIENT MAPPING OF NON-LINEAR CONVOLUTIONAL NEURAL NETWORKS ON FPGA ACCELERATORS
    Aiello, Giuseppe
    Bussolino, Beatrice
    Valpreda, Emanuele
    Roch, Massimo Ruo
    Masera, Guido
    Martina, Maurizio
    Marsi, Stefano
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 926 - 930
  • [30] Efficient Learning Rate Adaptation for Convolutional Neural Network Training
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,