Speech densely connected convolutional networks for small-footprint keyword spotting

被引:0
作者
Tsung-Han Tsai
Xin-Hui Lin
机构
[1] Department of Electrical Engineering,
[2] National Central University,undefined
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Keyword spotting; DenseNet; Group convolution; Depthwise separable convolution; SENet;
D O I
暂无
中图分类号
学科分类号
摘要
Keyword spotting is an important task for human-computer interaction (HCI). For high privacy, the identification task needs to be performed at the edge, so the purpose of this task is to improve the accuracy as much as possible within the limited cost. This paper proposes a new keyword spotting technique by the convolutional neural network (CNN) method. It is based on the application of densely connected convolutional networks (DenseNet). To make the model smaller, we replace the normal convolution with group convolution and depthwise separable convolution. We add squeeze-and-excitation networks (SENet) to enhance the weight of important features to increase the accuracy. To investigate the effect of different convolutions on DenseNet, we built two models: SpDenseNet and SpDenseNet-L. we validated the network using the Google speech commands dataset. Our proposed network had better accuracy than the other networks even with a fewer number of parameters and floating-point operations (FLOPs). SpDenseNet could achieve an accuracy of 96.3% with 122.63 K trainable parameters and 142.7 M FLOPs. Compared to the benchmark works, only about 52% of the number of parameters and about 12% of the FLOPs are used. In addition, we varied the depth and width of the network to build a compact variant. It also outperforms other compact variants, where SpDenseNet-L-narrow could achieve an accuracy of 93.6% withiri: An On-device DNN-powere 9.27 K trainable parameters and 3.47 M FLOPs. Compared to the benchmark works, the accuracy on SpDenseNet-L-narrow is improved by 3.5%. It only uses only about 47% of the number of parameters and about 48% of the FLOPS.
引用
收藏
页码:39119 / 39137
页数:18
相关论文
共 11 条
[1]  
Edu JS(2020)Smart home personal assistants: a security and privacy review ACM Comput Surv 53 1-36
[2]  
Such JM(2018)Learning structures of interval-based Bayesian networks in probabilistic generative model for human complex activity recognition Pattern Recogn 81 545-561
[3]  
Suarez-Tangil G(2019)Effective combination of densenet and BiLSTM for keyword spotting IEEE Access 7 10767-10775
[4]  
Liu L(undefined)undefined undefined undefined undefined-undefined
[5]  
Wang S(undefined)undefined undefined undefined undefined-undefined
[6]  
Hu B(undefined)undefined undefined undefined undefined-undefined
[7]  
Qiong Q(undefined)undefined undefined undefined undefined-undefined
[8]  
Wen J(undefined)undefined undefined undefined undefined-undefined
[9]  
Rosenblum DS(undefined)undefined undefined undefined undefined-undefined
[10]  
Zeng M(undefined)undefined undefined undefined undefined-undefined