EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing

被引:12
作者
Liu, Bo [1 ]
Wang, Zhen [2 ]
Fan, Hu [1 ]
Yang, Jing [1 ]
Zhu, Wentao [1 ]
Huang, Lepeng [1 ]
Gong, Yu [1 ]
Ge, Wei [1 ]
Shi, Longxing [1 ]
机构
[1] Southeast Univ, Natl ASIC Syst Engn Technol Res Ctr, Nanjing 210096, Jiangsu, Peoples R China
[2] Nanjing Prochip Elect Technol Co Ltd, Nanjing 210001, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Keyword spotting; binary weight network; approximate computing; RECONFIGURABLE ARCHITECTURE;
D O I
10.1109/ACCESS.2019.2924340
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposed an energy-efficient reconfigurable accelerator for keyword spotting (EERA-KWS) based on binary weight network (BWN) and fabricated in 28-nm CMOS technology. This keyword spotting system consists of two parts: the feature extraction based on melscale frequency cepstral coefficients (MFCC) and the keywords classification based on a BWN model, which is trained through the Google's Speech Commands database and deployed on our custom. To reduce the power consumption while maintaining the system recognition accuracy, we first optimize the MFCC implementation with approximate computing techniques, including Pre-emphasis coefficient transformation, rectangular Mel filtering, Framing and FFT optimization. Then, we propose a precision self-adaptive reconfigurable accelerator with digital-analog mixed approximate computing units to process the BWN efficiently. Based on the SNR prediction of background noise and post-detection of network output confidence, the BWN accelerator data path can be dynamically and adaptively reconfigured as 4, 8, or 16 bits. For the BWN accelerator, we proposed a time-delay based addition unit to process bit-wise approximate computing for the convolution layers and fully connected layers, and a LUT based unit for the activation layers. Implemented under TSMC 28 nm HPC+ process technology, the estimated power is 77.8 mu W similar to 115.9 mu W, the energy efficiency can achieve 163 TOPS/W, which is over 1.8x better than the state-of-the-art architecture.
引用
收藏
页码:82453 / 82465
页数:13
相关论文
共 21 条
[1]  
[Anonymous], ARXIV180403209
[2]   A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection [J].
Badami, Komail M. H. ;
Lauwereins, Steven ;
Meert, Wannes ;
Verhelst, Marian .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (01) :291-302
[3]  
Bang S, 2017, ISSCC DIG TECH PAP I, P250, DOI 10.1109/ISSCC.2017.7870355
[4]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[5]  
Dave N., 2013, Int J Adv Res Eng Technol, V1, P1
[6]   ARA: Cross-Layer approximate computing framework based reconfigurable architecture for CNNs [J].
Gong, Yu ;
Liu, Bo ;
Ge, Wei ;
Shi, Longxing .
MICROELECTRONICS JOURNAL, 2019, 87 :33-44
[7]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[8]  
HERMANSKY H, 1991, CONFERENCE RECORD OF THE TWENTY-FIFTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, P800, DOI 10.1109/ACSSC.1991.186557
[9]  
Horowitz M, 2014, ISSCC DIG TECH PAP I, V57, P10, DOI 10.1109/ISSCC.2014.6757323
[10]  
Kepuska V.Z., 2015, Journal of Computer and Communications, V3, P1, DOI [10.4236/jcc.2015.36001, DOI 10.4236/JCC.2015.36001]