Towards Memory-Efficient Processing-in-Memory Architecture for Convolutional Neural Networks

被引:0
作者
Wang, Yi [1 ]
Zhang, Mingxu [1 ,3 ]
Yang, Jing [2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[2] Harbin Inst Technol, Expt & Innovat Practice Ctr, Shenzhen, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Processing-in-memory; neuromorphic computing; non-volatile memory; scheduling; parallel computing;
D O I
10.1145/3078633.3081032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Convolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing-centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. This incurs a significant amount of data movement, especially for high-dimensional convolutions. Although recent embedded 3D-stacked Processing-in-Memory (PIM) architecture alleviates this memory bottleneck to provide fast near-data processing, memory is still a limiting factor of the entire system. An unsolved key challenge is how to efficiently allocate convolutions to 3D-stacked PIM to combine the advantages of both neural and computational processing. This paper presents Memolution, a compiler-based memory-efficient data allocation strategy for convolutional neural networks on PIM architecture. Memolution offers thread-level parallelism that can fully exploit the computational power of PIM architecture. The objective is to capture the characteristics of neural network applications and present a hardware-independent design to transparently allocate CNN applications onto the underlining hardware resources provided by PIM. We demonstrate the viability of the proposed technique using a variety of realistic convolutional neural network applications. Our extensive evaluations show that, Memolution significantly improves performance and the cache utilization compared to the baseline scheme.
引用
收藏
页码:81 / 90
页数:10
相关论文
共 24 条
[1]   Data Reorganization in Memory Using 3D-stacked DRAM [J].
Akin, Berkin ;
Franchetti, Franz ;
Hoe, James C. .
2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :131-143
[2]  
[Anonymous], P 11 IEEE ACM IFIP I
[3]  
[Anonymous], 2016, MICRO
[4]  
[Anonymous], P 16 ACM SIGPLAN SIG
[5]   DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].
Chen, Tianshi ;
Du, Zidong ;
Sun, Ninghui ;
Wang, Jia ;
Wu, Chengyong ;
Chen, Yunji ;
Temam, Olivier .
ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283
[6]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[7]   PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].
Chi, Ping ;
Li, Shuangchen ;
Xu, Cong ;
Zhang, Tao ;
Zhao, Jishen ;
Liu, Yongpan ;
Wang, Yu ;
Xie, Yuan .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39
[8]  
Conti F, 2015, DES AUT TEST EUROPE, P683
[9]  
Foroozannejad MH, 2010, LCTES 10-PROCEEDINGS OF THE ACM SIGPLAN/SIGBED 2010 CONFERENCE ON LANGUAGES, COMPILERS, & TOOLS FOR EMBEDDED SYSTEMS, P27
[10]  
Gao MY, 2016, INT S HIGH PERF COMP, P126, DOI 10.1109/HPCA.2016.7446059