EcoFlow: Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

被引：1

作者：

Orosa, Lois ^{[1
,2
]}

Koppula, Skanda ^{[3
]}

Umuroglu, Yaman ^{[4
,5
]}

Kanellopoulos, Konstantinos ^{[1
]}

Gomez-Luna, Juan ^{[1
]}

Blott, Michaela

Vissers, Kees

Mutlu, Onur ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland

[2] Galicia Supercomp Ctr, Santiago De Compostela 15705, Spain

[3] DeepMind, London EC4A 3TW, England

[4] Michaela Blott, Santa Clara, CA 95054 USA

[5] Kees Vissers, Santa Clara, CA 95054 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 09期

关键词：

Convolutional neural networks; Training; Computer architecture; Arrays; Kernel; Generative adversarial networks; Speech recognition; hardware accelerators;

D O I：

10.1109/TC.2023.3272282

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. We find that commonly-used low-power CNN inference accelerators are not optimized for both these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to existing accelerators. At its core, EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. We evaluate EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) workloads. Experiments in SASiML, our new cycle-accurate simulator, show that, using a common CNN inference accelerator, EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN dataflows.

引用

页码：2275 / 2289

页数：15

共 50 条

[21] A New Zero-Overhead Test Method for Low-Power AI Accelerators
Lee, Sangjun
Park, Jongho
Park, Sungwhan
Kim, Hyemin
Kang, Sungho
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (05) : 2649 - 2653
[22] Enabling Timing Error Resilience for Low-Power Systolic-Array Based Deep Learning Accelerators
Zhang, Jeff
Ghodsi, Zahra
Garg, Siddharth
Rangineni, Kartheek
[J]. IEEE DESIGN & TEST, 2020, 37 (02) : 93 - 102
[23] Enhancement of Convolutional Neural Network Hardware Accelerators Efficiency Using Sparsity Optimization Framework
Kurapati, Hemalatha
Ramachandran, Sakthivel
[J]. IEEE ACCESS, 2024, 12 : 86034 - 86042
[24] PRUNIX: Non-Ideality Aware Convolutional Neural Network Pruning for Memristive Accelerators
Al-Shaarawy, Ali
Amirsoleimani, Amirali
Genov, Roman
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1299 - 1303
[25] Bin-Specific Quantization in Spectral-Domain Convolutional Neural Network Accelerators
Park, Jinho
Lee, Jaewon
Kim, Gain
Bae, Hyeon-Min
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 407 - 410
[26] DivNet: Efficient Convolutional Neural Network via Multilevel Hierarchical Architecture Design
Kaddar, Bachir
Fizazi, Hadria
Hernandez-Cabronero, Miguel
Sanchez, Victor
Serra-Sagrista, Joan
[J]. IEEE ACCESS, 2021, 9 : 105892 - 105901
[27] A Configurable and Versatile Architecture for Low Power, Energy Efficient Hardware Acceleration of Convolutional Neural Networks
Christensen, Steinar Thune
Aunet, Snorre
Qadir, Omer
[J]. 2019 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS) - NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2019,
[28] Light Convolutional Neural Network for Digital Predistortion of Radio Frequency Power Amplifiers
Xie, Qian
Wang, Yong
Ding, Jianyang
Niu, Jiajun
[J]. IEEE COMMUNICATIONS LETTERS, 2024, 28 (10) : 2377 - 2381
[29] NLCMAP: A FRAMEWORK FOR THE EFFICIENT MAPPING OF NON-LINEAR CONVOLUTIONAL NEURAL NETWORKS ON FPGA ACCELERATORS
Aiello, Giuseppe
Bussolino, Beatrice
Valpreda, Emanuele
Roch, Massimo Ruo
Masera, Guido
Martina, Maurizio
Marsi, Stefano
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 926 - 930
[30] Efficient Learning Rate Adaptation for Convolutional Neural Network Training
Georgakopoulos, Spiros V.
Plagianakos, Vassilis P.
[J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,

← 1 2 3 4 5 →