EcoFlow: Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

被引:0
|
作者
Orosa, Lois [1 ,2 ]
Koppula, Skanda [3 ]
Umuroglu, Yaman [4 ,5 ]
Kanellopoulos, Konstantinos [1 ]
Gomez-Luna, Juan [1 ]
Blott, Michaela
Vissers, Kees
Mutlu, Onur [1 ]
机构
[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland
[2] Galicia Supercomp Ctr, Santiago De Compostela 15705, Spain
[3] DeepMind, London EC4A 3TW, England
[4] Michaela Blott, Santa Clara, CA 95054 USA
[5] Kees Vissers, Santa Clara, CA 95054 USA
关键词
Convolutional neural networks; Training; Computer architecture; Arrays; Kernel; Generative adversarial networks; Speech recognition; hardware accelerators;
D O I
10.1109/TC.2023.3272282
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. We find that commonly-used low-power CNN inference accelerators are not optimized for both these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to existing accelerators. At its core, EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. We evaluate EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) workloads. Experiments in SASiML, our new cycle-accurate simulator, show that, using a common CNN inference accelerator, EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN dataflows.
引用
收藏
页码:2275 / 2289
页数:15
相关论文
共 50 条
  • [1] Quantised Neural Network Accelerators for Low-Power IDS in Automotive Networks
    Khandelwal, Shashwat
    Walsh, Anneliese
    Shreejith, Shanker
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [2] CoopNet: Cooperative Convolutional Neural Network for Low-Power MCUs
    Mocerino, Luca
    Calimera, Andrea
    2019 26TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2019, : 414 - 417
  • [3] DEF: Differential Encoding of Featuremaps for Low Power Convolutional Neural Network Accelerators
    Montgomerie-Corcoran, Alexander
    Savvas-Bouganis, Christos
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 703 - 708
  • [4] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
    Gao, Fei
    Guan, Haixin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
  • [5] Considerations of Integrating Computing-In-Memory and Processing-In-Sensor into Convolutional Neural Network Accelerators for Low-Power Edge Devices
    Tang, Kea-Tiong
    Wei, Wei-Chen
    Yeh, Zuo-Wei
    Hsu, Tzu-Hsiang
    Chiu, Yen-Cheng
    Xue, Cheng-Xin
    Kuo, Yu -Chun
    Wen, Tai-Hsing
    Ho, Mon-Shu
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Chang, Meng-Fan
    2019 SYMPOSIUM ON VLSI CIRCUITS, 2019, : T166 - T167
  • [6] Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
    Reagen, Brandon
    Whatmough, Paul
    Adolf, Robert
    Rama, Saketh
    Lee, Hyunkwang
    Lee, Sae Kyu
    Miguel Hernandez-Lobato, Jose
    Wei, Gu-Yeon
    Brooks, David
    2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 267 - 278
  • [7] Low-Power Convolutional Neural Network Processor for a Face-Recognition System
    Bong, Kyeongryeol
    Choi, Sungpill
    Kim, Changhyeon
    Yoo, Hoi-Jun
    IEEE MICRO, 2017, 37 (06) : 30 - 38
  • [8] A convolutional neural network tolerant of synaptic faults for low-power analog hardware
    Fieres, Johannes
    Meier, Karlheinz
    Schemmel, Johannes
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 122 - 132
  • [9] Highly Efficient Test Architecture for Low-Power AI Accelerators
    Ibtesam, Muhammad
    Solangi, Umair Saeed
    Kim, Jinuk
    Ansari, Muhammad Adil
    Park, Sungju
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (08) : 2728 - 2738
  • [10] An efficient loop tiling framework for convolutional neural network inference accelerators
    Huang, Hongmin
    Hu, Xianghong
    Li, Xueming
    Xiong, Xiaoming
    IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123