High-Speed Power-Efficient Coarse-Grained Convolver Architecture using Depth-First Compression Scheme

被引:6
作者
Wu, Yi-Lin [1 ]
Lu, Yi [1 ]
Huang, Juinn-Dar [1 ]
机构
[1] Natl Chiao Tung Univ, Inst Elect, Hsinchu, Taiwan
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) | 2020年
关键词
convolutional neural network (CNN); hardware accelerator; convolver design; multiply-accumulate operation; depth-first compression; compensation vector; MULTIPLIER;
D O I
10.1109/iscas45731.2020.9180406
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNNs) have been playing an important role in various applications, e.g., computer vision. Since CNN computations require numerous multiply-accumulate (MAC) operations, how to get them done efficiently is a crucial issue for CNN hardware accelerators. In this paper, we propose a high-speed power-efficient convolver architecture for CNN acceleration. A 3x3 convolver is asked to produce an output every cycle and is commonly accomplished by summing up the results of nine parallel multiplications, which requires ten carry-propagation adders (CPAs) in total. However, the proposed coarse-grained convolver can break the boundary between multipliers and reduce all partial products in a more global way. Consequently, it requires only one CPA to generate the final outcome. It also features a globally delayoptimized partial product reduction tree and a depth-first compression scheme for both area and power minimization. The proposed convolver has been implemented using TSMC 40nm technology. Compared to a conventional 3x3 convolver baseline design, our design can reduce area and power by 15.8% and 26.5% respectively at the clock rate of 1GHz.
引用
收藏
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 1993, Computer arithmetic algorithms
[2]   An Architecture to Accelerate Convolution in Deep Neural Networks [J].
Ardakani, Arash ;
Condo, Carlo ;
Ahmadi, Mehdi ;
Gross, Warren J. .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (04) :1349-1362
[3]   A SIGNED BINARY MULTIPLICATION TECHNIQUE [J].
BOOTH, AD .
QUARTERLY JOURNAL OF MECHANICS AND APPLIED MATHEMATICS, 1951, 4 (02) :236-240
[4]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[5]   DaDianNao: A Machine-Learning Supercomputer [J].
Chen, Yunji ;
Luo, Tao ;
Liu, Shaoli ;
Zhang, Shijin ;
He, Liqiang ;
Wang, Jia ;
Li, Ling ;
Chen, Tianshi ;
Xu, Zhiwei ;
Sun, Ninghui ;
Temam, Olivier .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :609-622
[6]  
Dadda L., 1965, Alta Freq, V34, P349
[7]   A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things [J].
Du, Li ;
Du, Yuan ;
Li, Yilei ;
Su, Junjie ;
Kuan, Yen-Cheng ;
Liu, Chun-Chen ;
Chang, Mau-Chung Frank .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (01) :198-208
[8]  
Fadavi-Ardekani J., 1993, IEEE T VLSI SYSTEMS, V1
[9]  
Fried R, 1997, 1997 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, PROCEEDINGS, P214, DOI 10.1109/LPE.1997.621285
[10]   EIE: Efficient Inference Engine on Compressed Deep Neural Network [J].
Han, Song ;
Liu, Xingyu ;
Mao, Huizi ;
Pu, Jing ;
Pedram, Ardavan ;
Horowitz, Mark A. ;
Dally, William J. .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :243-254