Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

被引：3

作者：

Choong, Benjamin Chen Ming ^{[1
]}

Luo, Tao ^{[2
]}

Liu, Cheng ^{[3
]}

He, Bingsheng ^{[4
]}

Zhang, Wei ^{[5
]}

Zhou, Joey Tianyi ^{[6
]}

机构：

[1] Natl Univ Singapore, Dept Elect & Comp Engn, 4 Engn Dr 3, Singapore 117583, Singapore

[2] Agcy Sci Technol & Res, Inst High Performance Comp, 1 Fusionopolis Way,16-16 Connexis, Singapore 138632, Singapore

[3] Chinese Acad Sci, Inst Comp Technol, 6 Kexueyuan South Rd, Beijing 100190, Peoples R China

[4] Natl Univ Singapore, Sch Comp, COM1,13 Comp Dr, Singapore 117417, Singapore

[5] Hong Kong Univ Sci & Technol, Kowloon, Clear Water Bay, Hong Kong, Peoples R China

[6] ASTAR, Ctr Frontier AI Res, 1 Fusionopolis Way,16-16 Connexis, Singapore 138632, Singapore

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2022年 / 128卷

关键词：

Artificial intelligence; Hardware-software co-design; Deep learning; Embedded systems; Emerging memory; NEURAL-NETWORKS; ENERGY; ARCHITECTURE; ACCELERATOR; MACHINE; ADDER;

D O I：

10.1016/j.sysarc.2022.102507

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.

引用

页数：20

共 68 条

[1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].

Aimar, Alessandro ;

Mostafa, Hesham ;

Calabrese, Enrico ;

Rios-Navarro, Antonio ;

Tapiador-Morales, Ricardo ;

Lungu, Iulia-Alexandra ;

Milde, Moritz B. ;

Corradi, Federico ;

Linares-Barranco, Alejandro ;

Liu, Shih-Chii ;

Delbruck, Tobi .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656

[2]

[Anonymous], 2015, PROC INT C LEARNING

[3] A SIGNED BINARY MULTIPLICATION TECHNIQUE [J].

BOOTH, AD .

QUARTERLY JOURNAL OF MECHANICS AND APPLIED MATHEMATICS, 1951, 4 (02) :236-240

[4] Skyrmion Logic System for Large-Scale Reversible Computation [J].

Chauwin, Maverick ;

Hu, Xuan ;

Garcia-Sanchez, Felipe ;

Betrabet, Neilesh ;

Paler, Alexandru ;

Moutafis, Christoforos ;

Friedman, Joseph S. .

PHYSICAL REVIEW APPLIED, 2019, 12 (06)

[5] FlinkCL: An OpenCL-Based In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data [J].

Chen, Cen ;

Li, Kenli ;

Ouyang, Aijia ;

Li, Keqin .

IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (12) :1765-1779

[6] GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data [J].

Chen, Cen ;

Li, Kenli ;

Ouyang, Aijia ;

Zeng, Zeng ;

Li, Keqin .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) :1275-1288

[7] GPU-Accelerated Parallel Hierarchical Extreme Learning Machine on Flink for Big Data [J].

Chen, Cen ;

Li, Kenli ;

Ouyang, Aijia ;

Tang, Zhuo ;

Li, Keqin .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (10) :2740-2753

[8] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].

Chen, Yu-Hsin ;

Krishna, Tushar ;

Emer, Joel S. ;

Sze, Vivienne .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138

[9] DWMAcc: Accelerating Shift-based CNNs with Domain Wall Memories [J].

Chen, Zhengguo ;

Deng, Quan ;

Xiao, Nong ;

Pruhs, Kirk ;

Zhang, Youtao .

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2019, 18 (05)

[10]

Ding RZ, 2018, ASIA S PACIF DES AUT, P1, DOI 10.1109/ASPDAC.2018.8297274

← 1 2 3 4 5 6 7 →