Towards Memory-Efficient Processing-in-Memory Architecture for Convolutional Neural Networks

被引：0

作者：

Wang, Yi ^{[1
]}

Zhang, Mingxu ^{[1
,3
]}

Yang, Jing ^{[2
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

[2] Harbin Inst Technol, Expt & Innovat Practice Ctr, Shenzhen, Peoples R China

[3] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China

来源：

ACM SIGPLAN NOTICES | 2017年 / 52卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Processing-in-memory; neuromorphic computing; non-volatile memory; scheduling; parallel computing;

D O I：

10.1145/3078633.3081032

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Convolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing-centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. This incurs a significant amount of data movement, especially for high-dimensional convolutions. Although recent embedded 3D-stacked Processing-in-Memory (PIM) architecture alleviates this memory bottleneck to provide fast near-data processing, memory is still a limiting factor of the entire system. An unsolved key challenge is how to efficiently allocate convolutions to 3D-stacked PIM to combine the advantages of both neural and computational processing. This paper presents Memolution, a compiler-based memory-efficient data allocation strategy for convolutional neural networks on PIM architecture. Memolution offers thread-level parallelism that can fully exploit the computational power of PIM architecture. The objective is to capture the characteristics of neural network applications and present a hardware-independent design to transparently allocate CNN applications onto the underlining hardware resources provided by PIM. We demonstrate the viability of the proposed technique using a variety of realistic convolutional neural network applications. Our extensive evaluations show that, Memolution significantly improves performance and the cache utilization compared to the baseline scheme.

引用

页码：81 / 90

页数：10

共 24 条

[1] Data Reorganization in Memory Using 3D-stacked DRAM [J].

Akin, Berkin ;

Franchetti, Franz ;

Hoe, James C. .

2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, :131-143

[2]

[Anonymous], P 11 IEEE ACM IFIP I

[3]

[Anonymous], 2016, MICRO

[4]

[Anonymous], P 16 ACM SIGPLAN SIG

[5] DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].

Chen, Tianshi ;

Du, Zidong ;

Sun, Ninghui ;

Wang, Jia ;

Wu, Chengyong ;

Chen, Yunji ;

Temam, Olivier .

ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283

[6] Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].

Chen, Yu-Hsin ;

Emer, Joel ;

Sze, Vivienne .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379

[7] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].

Chi, Ping ;

Li, Shuangchen ;

Xu, Cong ;

Zhang, Tao ;

Zhao, Jishen ;

Liu, Yongpan ;

Wang, Yu ;

Xie, Yuan .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39

[8]

Conti F, 2015, DES AUT TEST EUROPE, P683

[9]

Foroozannejad MH, 2010, LCTES 10-PROCEEDINGS OF THE ACM SIGPLAN/SIGBED 2010 CONFERENCE ON LANGUAGES, COMPILERS, & TOOLS FOR EMBEDDED SYSTEMS, P27

[10]

Gao MY, 2016, INT S HIGH PERF COMP, P126, DOI 10.1109/HPCA.2016.7446059

← 1 2 3 →