EcoFlow: Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

被引：1

作者：

Orosa, Lois ^{[1
,2
]}

Koppula, Skanda ^{[3
]}

Umuroglu, Yaman ^{[4
,5
]}

Kanellopoulos, Konstantinos ^{[1
]}

Gomez-Luna, Juan ^{[1
]}

Blott, Michaela

Vissers, Kees

Mutlu, Onur ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland

[2] Galicia Supercomp Ctr, Santiago De Compostela 15705, Spain

[3] DeepMind, London EC4A 3TW, England

[4] Michaela Blott, Santa Clara, CA 95054 USA

[5] Kees Vissers, Santa Clara, CA 95054 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2024年 / 73卷 / 09期

关键词：

Convolutional neural networks; Training; Computer architecture; Arrays; Kernel; Generative adversarial networks; Speech recognition; hardware accelerators;

D O I：

10.1109/TC.2023.3272282

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. We find that commonly-used low-power CNN inference accelerators are not optimized for both these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to existing accelerators. At its core, EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. We evaluate EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) workloads. Experiments in SASiML, our new cycle-accurate simulator, show that, using a common CNN inference accelerator, EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN dataflows.

引用

页码：2275 / 2289

页数：15

共 50 条

[31] Light Convolutional Neural Network for Digital Predistortion of Radio Frequency Power Amplifiers [J].

Xie, Qian ;

Wang, Yong ;

Ding, Jianyang ;

Niu, Jiajun .

IEEE COMMUNICATIONS LETTERS, 2024, 28 (10) :2377-2381

[32] NLCMAP: A FRAMEWORK FOR THE EFFICIENT MAPPING OF NON-LINEAR CONVOLUTIONAL NEURAL NETWORKS ON FPGA ACCELERATORS [J].

Aiello, Giuseppe ;

Bussolino, Beatrice ;

Valpreda, Emanuele ;

Roch, Massimo Ruo ;

Masera, Guido ;

Martina, Maurizio ;

Marsi, Stefano .

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :926-930

[33] Efficient Learning Rate Adaptation for Convolutional Neural Network Training [J].

Georgakopoulos, Spiros V. ;

Plagianakos, Vassilis P. .

2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,

[34] A Novel, Efficient Implementation of a Local Binary Convolutional Neural Network [J].

Lin, Ing-Chao ;

Tang, Chi-Huan ;

Ni, Chi-Ting ;

Hu, Xing ;

Shen, Yu-Tong ;

Chen, Pei-Yin ;

Xie, Yuan .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (04) :1413-1417

[35] An Efficient Hand Detection Method based on Convolutional Neural Network [J].

Le, Trung-Hieu ;

Jaw, Da-Wei ;

Lin, I-Chuan ;

Liu, Hui-Bin ;

Huang, Shih-Chia .

2018 7TH IEEE INTERNATIONAL SYMPOSIUM ON NEXT-GENERATION ELECTRONICS (ISNE), 2018, :420-421

[36] Exploration and Generation of Efficient FPGA-based Deep Neural Network Accelerators [J].

Ali, Nermine ;

Philippe, Jean-Marc ;

Tain, Benoit ;

Coussy, Philippe .

2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, :123-128

[37] Energy-Efficient Brain Floating Point Convolutional Neural Network Using Memristors [J].

Tong, Shao-Qin ;

Bao, Han ;

Li, Jian-Cong ;

Yang, Ling ;

Zhou, Hou-Ji ;

Li, Yi ;

Miao, Xiang-Shui .

IEEE TRANSACTIONS ON ELECTRON DEVICES, 2024, 71 (05) :3293-3300

[38] Low Bit-Width Convolutional Neural Network on RRAM [J].

Cai, Yi ;

Tang, Tianqi ;

Xia, Lixue ;

Li, Boxun ;

Wang, Yu ;

Yang, Huazhong .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (07) :1414-1427

[39] A Framework for Ultra Low-Power Hardware Accelerators Using NNs for Embedded Time Series Classification [J].

Reiser, Daniel ;

Reichel, Peter ;

Pechmann, Stefan ;

Mallah, Maen ;

Oppelt, Maximilian ;

Hagelauer, Amelie ;

Breiling, Marco ;

Fey, Dietmar ;

Reichenbach, Marc .

JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2022, 12 (01)

[40] Probabilistic Linguistic Convolutional Neural Network Dealing With Low-Quality Image Classification [J].

Xiao, Xiangyu ;

Xu, Zeshui ;

Gan, Weidong ;

Wu, Tong ;

Zheng, Yuanhang .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2025, 33 (05) :1678-1690

← 1 2 3 4 5 →