A Depthwise Separable Convolution Neural Network for Small-footprint Keyword Spotting Using Approximate MAC Unit and Streaming Convolution Reuse

被引：0

作者：

Lu, Yicheng ^{[1
]}

Shan, Weiwei ^{[1
]}

Xu, Jiaming ^{[1
]}

机构：

[1] Southeast Univ, Sch Elect Sci & Engn, Nanjing 210096, Peoples R China

来源：

2019 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2019) | 2019年

关键词：

Keyword spotting; Approximate computing; Data resue; Depthwise separable convolution;

D O I：

10.1109/apccas47518.2019.8953096

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, many applications of voice wake-up technology have entered people's lives and the key technology is Keyword Spotting (KWS). The keyword spotting system needs to detect the ambient voice and wait for a wake-up at any time, which requires low power consumption and high recognition accuracy. We mainly aim at reducing the power consumption of real-time keyword spotting systems in this paper. Based on Google's speech commands dataset (GSCD), a deep neural network model with Depthwise Separable Convolution (DS-Conv) is constructed and trained. We propose a kind of Approximate Multiply and Accumulate Unit (AP-MAC) and a data reuse method called Streaming Convolution Reuse (SCR) and prove that the neural network with AP-MACs saves 37.7% similar to 42.6% of computing power and achieves similar Word Error Rate (WER) compared to the same model using traditional MAC units in KWS task. Also, SCR allows the model to reuse convolution results for multiple audio frames and saves 94% of activations storage. By combining these two methods, the computing power and memory storage per audio frame of the baseline model are reduced by 98.5% similar to 98.7% and 94% respectively.

引用

页码：309 / 312

页数：4

共 8 条

[1]

Agrawal RK, 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), P362

[2]

Chen G., 2014, 2014 IEEE INT C ACOU, P4087

[3]

Gupta P, 2011, NANOELECTRONIC CIRCUIT DESIGN, P409, DOI 10.1007/978-1-4419-7609-3_12

[4]

Howard AG, 2017, ARXIV

[5]

Prabhavalkar R, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4839, DOI 10.1109/ICASSP.2018.8461809

[6]

Sainath TN, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1478

[7] Within and cross-corpus speech emotion recognition using latent topic model-based features [J].

Shah, Mohit ;

Chakrabarti, Chaitali ;

Spanias, Andreas .

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,

[8]

Warden P., 2018, ARXIV

← 1 →