Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

被引:3
作者
Choong, Benjamin Chen Ming [1 ]
Luo, Tao [2 ]
Liu, Cheng [3 ]
He, Bingsheng [4 ]
Zhang, Wei [5 ]
Zhou, Joey Tianyi [6 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, 4 Engn Dr 3, Singapore 117583, Singapore
[2] Agcy Sci Technol & Res, Inst High Performance Comp, 1 Fusionopolis Way,16-16 Connexis, Singapore 138632, Singapore
[3] Chinese Acad Sci, Inst Comp Technol, 6 Kexueyuan South Rd, Beijing 100190, Peoples R China
[4] Natl Univ Singapore, Sch Comp, COM1,13 Comp Dr, Singapore 117417, Singapore
[5] Hong Kong Univ Sci & Technol, Kowloon, Clear Water Bay, Hong Kong, Peoples R China
[6] ASTAR, Ctr Frontier AI Res, 1 Fusionopolis Way,16-16 Connexis, Singapore 138632, Singapore
关键词
Artificial intelligence; Hardware-software co-design; Deep learning; Embedded systems; Emerging memory; NEURAL-NETWORKS; ENERGY; ARCHITECTURE; ACCELERATOR; MACHINE; ADDER;
D O I
10.1016/j.sysarc.2022.102507
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.
引用
收藏
页数:20
相关论文
共 68 条
[1]   NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].
Aimar, Alessandro ;
Mostafa, Hesham ;
Calabrese, Enrico ;
Rios-Navarro, Antonio ;
Tapiador-Morales, Ricardo ;
Lungu, Iulia-Alexandra ;
Milde, Moritz B. ;
Corradi, Federico ;
Linares-Barranco, Alejandro ;
Liu, Shih-Chii ;
Delbruck, Tobi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656
[2]  
[Anonymous], 2015, PROC INT C LEARNING
[3]   A SIGNED BINARY MULTIPLICATION TECHNIQUE [J].
BOOTH, AD .
QUARTERLY JOURNAL OF MECHANICS AND APPLIED MATHEMATICS, 1951, 4 (02) :236-240
[4]   Skyrmion Logic System for Large-Scale Reversible Computation [J].
Chauwin, Maverick ;
Hu, Xuan ;
Garcia-Sanchez, Felipe ;
Betrabet, Neilesh ;
Paler, Alexandru ;
Moutafis, Christoforos ;
Friedman, Joseph S. .
PHYSICAL REVIEW APPLIED, 2019, 12 (06)
[5]   FlinkCL: An OpenCL-Based In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data [J].
Chen, Cen ;
Li, Kenli ;
Ouyang, Aijia ;
Li, Keqin .
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (12) :1765-1779
[6]   GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data [J].
Chen, Cen ;
Li, Kenli ;
Ouyang, Aijia ;
Zeng, Zeng ;
Li, Keqin .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) :1275-1288
[7]   GPU-Accelerated Parallel Hierarchical Extreme Learning Machine on Flink for Big Data [J].
Chen, Cen ;
Li, Kenli ;
Ouyang, Aijia ;
Tang, Zhuo ;
Li, Keqin .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (10) :2740-2753
[8]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[9]   DWMAcc: Accelerating Shift-based CNNs with Domain Wall Memories [J].
Chen, Zhengguo ;
Deng, Quan ;
Xiao, Nong ;
Pruhs, Kirk ;
Zhang, Youtao .
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2019, 18 (05)
[10]  
Ding RZ, 2018, ASIA S PACIF DES AUT, P1, DOI 10.1109/ASPDAC.2018.8297274