Energy-friendly keyword spotting system using add-based convolution

被引:1
作者
Zhou, Hang [1 ]
Hu, Wenchao [1 ]
Yeung, Yu Ting [1 ]
Chen, Xiao [1 ]
机构
[1] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
来源
INTERSPEECH 2021 | 2021年
关键词
keyword spotting; energy-friendly; human-computer interaction;
D O I
10.21437/Interspeech.2021-458
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Wake-up keyword of a keyword spotting (KWS) system represents brand name of a smart device. Performance of KWS is also crucial for modern speech based human-device interaction. An on-device KWS with both high accuracy and low power consumption is desired. We propose a KWS with add-based convolution layers, namely Add TC-ResNet. Add-based convolution paves a new way to reduce power consumption of KWS system, as addition is more energy efficient than multiplication at hardware level. On Google Speech Commands dataset V2, Add TC-ResNet achieves an accuracy of 97.1%, with 99% of multiplication operations are replaced by addition operations. The result is competitive to a state-of-the-art fully multiplication-based TC-ResNet KWS. We also investigate knowledge distillation and a mixed addition-multiplication design for the proposed KWS, which leads to further performance improvement.
引用
收藏
页码:4234 / 4238
页数:5
相关论文
共 21 条
  • [1] Alvarez R, 2019, INT CONF ACOUST SPEE, P6336, DOI 10.1109/ICASSP.2019.8683557
  • [2] Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
    Arik, Sercan O.
    Kliegl, Markus
    Child, Rewon
    Hestness, Joel
    Gibiansky, Andrew
    Fougner, Chris
    Prenger, Ryan
    Coates, Adam
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1606 - 1610
  • [3] Bluche Th <prime>eodore, 2020, ARXIV200210851
  • [4] Chen G., 2014, P IEEE INT C AC SPEE, P4087, DOI 10.1109/ICASSP.2014.6854370
  • [5] AdderNet: Do We Really Need Multiplications in Deep Learning?
    Chen, Hanting
    Wang, Yunhe
    Xu, Chunjing
    Shi, Boxin
    Xu, Chao
    Tian, Qi
    Xu, Chang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1465 - 1474
  • [6] Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
    Choi, Seungwoo
    Seo, Seokjun
    Shin, Beomjun
    Byun, Hyeongmin
    Kersner, Martin
    Kim, Beomsu
    Kim, Dongyoung
    Ha, Sungjoo
    [J]. INTERSPEECH 2019, 2019, : 3372 - 3376
  • [7] Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection
    Higuchi, Takuya
    Ghasemzadeh, Mohammad
    You, Kisun
    Dhir, Chandra
    [J]. INTERSPEECH 2020, 2020, : 2592 - 2596
  • [8] Hinton G, 2015, CORR
  • [9] Horowitz M, 2014, ISSCC DIG TECH PAP I, V57, P10, DOI 10.1109/ISSCC.2014.6757323
  • [10] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
    Park, Daniel S.
    Chan, William
    Zhang, Yu
    Chiu, Chung-Cheng
    Zoph, Barret
    Cubuk, Ekin D.
    Le, Quoc, V
    [J]. INTERSPEECH 2019, 2019, : 2613 - 2617