Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting

被引:18
|
作者
Xu, Menglong [1 ]
Zhang, Xiao-Lei [1 ]
机构
[1] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen, Peoples R China
来源
基金
美国国家科学基金会;
关键词
keyword spotting; depthwise separable convolution; squeeze-and-excitation block;
D O I
10.21437/Interspeech.2020-1045
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
One difficult problem of keyword spotting is how to miniaturize its memory footprint while maintain a high precision. Although convolutional neural networks have shown to be effective to the small-footprint keyword spotting problem, they still need hundreds of thousands of parameters to achieve good performance. In this paper, we propose an efficient model based on depth-wise separable convolution layers and squeeze-and-excitation blocks. Specifically, we replace the standard convolution by the depthwise separable convolution, which reduces the number of the parameters of the standard convolution without significant performance degradation. We further improve the performance of the depthwise separable convolution by reweighting the output feature maps of the first convolution layer with a so-called squeeze-and-excitation block. We compared the proposed method with five representative models on two experimental settings of the Google Speech Commands dataset. Experimental results show that the proposed method achieves the state-of-the-art performance. For example, it achieves a classification error rate of 3.29% with a number of parameters of 72K in the first experiment, which significantly outperforms the comparison methods given a similar model size. It achieves an error rate of 3.97% with a number of parameters of 10K, which is also slightly better than the state-of-the-art comparison method given a similar model size.
引用
收藏
页码:2547 / 2551
页数:5
相关论文
共 50 条
  • [21] Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
    Chen, Mengzhe
    Zhang, Shiliang
    Lei, Ming
    Liu, Yong
    Yao, Haitao
    Gao, Jie
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2663 - 2667
  • [22] Text Anchor Based Metric Learning for Small-footprint Keyword Spotting
    Wang, Li
    Gu, Rongzhi
    Chen, Nuo
    Zou, Yuexian
    INTERSPEECH 2021, 2021, : 4219 - 4223
  • [23] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting
    Ghandoura, Abdulkader
    Hjabo, Farouk
    Al Dakkak, Oumayma
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [24] DSSEMFF: A Depthwise Separable Squeeze-and-excitation Based on Multi-feature Fusion for Image Classification
    Junjun Liu
    Jun Zhang
    Sensing and Imaging, 2022, 23
  • [25] STREAMING SMALL-FOOTPRINT KEYWORD SPOTTING USING SEQUENCE-TO-SEQUENCE MODELS
    He, Yanzhang
    Prabhavalkar, Rohit
    Rao, Kanishka
    Li, Wei
    Bakhtin, Anton
    McGraw, Ian
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 474 - 481
  • [26] DSSEMFF: A Depthwise Separable Squeeze-and-excitation Based on Multi-feature Fusion for Image Classification
    Liu, Junjun
    Zhang, Jun
    SENSING AND IMAGING, 2022, 23 (01):
  • [27] Domain Aware Training for Far-field Small-footprint Keyword Spotting
    Wu, Haiwei
    Jia, Yan
    Nie, Yuanfei
    Li, Ming
    INTERSPEECH 2020, 2020, : 2562 - 2566
  • [28] SMALL-FOOTPRINT KEYWORD SPOTTING ON RAW AUDIO DATA WITH SINC-CONVOLUTIONS
    Mittermaier, Simon
    Kuerzinger, Ludwig
    Waschneck, Bernd
    Rigoll, Gerhard
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7454 - 7458
  • [29] Small-footprint Spiking Neural Networks for Power-efficient Keyword Spotting
    Pedroni, Bruno U.
    Sheik, Sadique
    Mostafa, Hesham
    Paul, Somnath
    Augustine, Charles
    Cauwenberghs, Gert
    2018 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS): ADVANCED SYSTEMS FOR ENHANCING HUMAN HEALTH, 2018, : 591 - 594
  • [30] A hybrid approach consisting of 3D depthwise separable convolution and depthwise squeeze-and-excitation network for hyperspectral image classification
    Asker, Mehmet Emin
    Gungor, Mustafa
    EARTH SCIENCE INFORMATICS, 2024, 17 (06) : 5795 - 5821