EcoFlow: Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

被引：0

作者：

Orosa, Lois ^{[1
,2
]}

Koppula, Skanda ^{[3
]}

Umuroglu, Yaman ^{[4
,5
]}

Kanellopoulos, Konstantinos ^{[1
]}

Gomez-Luna, Juan ^{[1
]}

Blott, Michaela

Vissers, Kees

Mutlu, Onur ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland

[2] Galicia Supercomp Ctr, Santiago De Compostela 15705, Spain

[3] DeepMind, London EC4A 3TW, England

[4] Michaela Blott, Santa Clara, CA 95054 USA

[5] Kees Vissers, Santa Clara, CA 95054 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 09期

关键词：

Convolutional neural networks; Training; Computer architecture; Arrays; Kernel; Generative adversarial networks; Speech recognition; hardware accelerators;

D O I：

10.1109/TC.2023.3272282

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. We find that commonly-used low-power CNN inference accelerators are not optimized for both these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to existing accelerators. At its core, EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. We evaluate EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) workloads. Experiments in SASiML, our new cycle-accurate simulator, show that, using a common CNN inference accelerator, EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN dataflows.

引用

页码：2275 / 2289

页数：15

共 50 条

[1] Quantised Neural Network Accelerators for Low-Power IDS in Automotive Networks
Khandelwal, Shashwat
Walsh, Anneliese
Shreejith, Shanker
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[2] CoopNet: Cooperative Convolutional Neural Network for Low-Power MCUs
Mocerino, Luca
Calimera, Andrea
2019 26TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2019, : 414 - 417
[3] DEF: Differential Encoding of Featuremaps for Low Power Convolutional Neural Network Accelerators
Montgomerie-Corcoran, Alexander
Savvas-Bouganis, Christos
2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 703 - 708
[4] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
Gao, Fei
Guan, Haixin
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
[5] Considerations of Integrating Computing-In-Memory and Processing-In-Sensor into Convolutional Neural Network Accelerators for Low-Power Edge Devices
Tang, Kea-Tiong
Wei, Wei-Chen
Yeh, Zuo-Wei
Hsu, Tzu-Hsiang
Chiu, Yen-Cheng
Xue, Cheng-Xin
Kuo, Yu -Chun
Wen, Tai-Hsing
Ho, Mon-Shu
Lo, Chung-Chuan
Liu, Ren-Shuo
Hsieh, Chih-Cheng
Chang, Meng-Fan
2019 SYMPOSIUM ON VLSI CIRCUITS, 2019, : T166 - T167
[6] Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
Reagen, Brandon
Whatmough, Paul
Adolf, Robert
Rama, Saketh
Lee, Hyunkwang
Lee, Sae Kyu
Miguel Hernandez-Lobato, Jose
Wei, Gu-Yeon
Brooks, David
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 267 - 278
[7] Low-Power Convolutional Neural Network Processor for a Face-Recognition System
Bong, Kyeongryeol
Choi, Sungpill
Kim, Changhyeon
Yoo, Hoi-Jun
IEEE MICRO, 2017, 37 (06) : 30 - 38
[8] A convolutional neural network tolerant of synaptic faults for low-power analog hardware
Fieres, Johannes
Meier, Karlheinz
Schemmel, Johannes
ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 122 - 132
[9] Highly Efficient Test Architecture for Low-Power AI Accelerators
Ibtesam, Muhammad
Solangi, Umair Saeed
Kim, Jinuk
Ansari, Muhammad Adil
Park, Sungju
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (08) : 2728 - 2738
[10] An efficient loop tiling framework for convolutional neural network inference accelerators
Huang, Hongmin
Hu, Xianghong
Li, Xueming
Xiong, Xiaoming
IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123

← 1 2 3 4 5 →