An Anchor-Free Detector for Continuous Speech Keyword Spotting

被引:0
作者
Zhao, Zhiyuan [1 ]
Tang, Chuanxin [1 ]
Yao, Chengdong [2 ]
Luo, Chong [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Univ Technol Sydney, Sydney, NSW, Australia
来源
INTERSPEECH 2022 | 2022年
关键词
keyword spotting; continuous speech keyword spotting; speech recognition; anchor-free detector; open dataset;
D O I
10.21437/Interspeech.2022-296
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech. In this paper, we regard CSKWS as a one-dimensional object detection task and propose a novel anchor-free detector, named AF-KWS, to solve the problem. AF-KWS directly regresses the center locations and lengths of the keywords through a single-stage deep neural network. In particular, AF-KWS is tailored for this speech task as we introduce an auxiliary unknown class to exclude other words from non-speech or silent background. We have built two benchmark datasets named LibriTop-20 and continuous meeting analysis keywords (CMAK) dataset for CSKWS. Evaluations on these two datasets show that our proposed AF-KWS outperforms reference schemes by a large margin, and therefore provides a decent baseline for future research.
引用
收藏
页码:3228 / 3232
页数:5
相关论文
共 34 条
  • [1] Adya S, 2020, INTERSPEECH, P3351
  • [2] Alvarez R, 2019, INT CONF ACOUST SPEE, P6336, DOI 10.1109/ICASSP.2019.8683557
  • [3] Arora S., 2021, ARXIV211114706
  • [4] Baljekar P, 2014, IEEE W SP LANG TECH, P536, DOI 10.1109/SLT.2014.7078631
  • [5] Chen Guoguo, 2014, ICASSP, P4087, DOI DOI 10.1109/ICASSP.2014.68543702[5]T
  • [6] Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
    Choi, Seungwoo
    Seo, Seokjun
    Shin, Beomjun
    Byun, Hyeongmin
    Kersner, Martin
    Kim, Beomsu
    Kim, Dongyoung
    Ha, Sungjoo
    [J]. INTERSPEECH 2019, 2019, : 3372 - 3376
  • [7] Coucke A, 2019, INT CONF ACOUST SPEE, P6351, DOI 10.1109/ICASSP.2019.8683474
  • [8] Dictionary O. E., 1989, SIMPSON JA WEINER ES, P3
  • [9] Garofolo J. S., 1993, NASA STIN, DOI DOI 10.35111/17GK-BN40
  • [10] Hayashi T, 2020, INT CONF ACOUST SPEE, P7654, DOI [10.1109/icassp40776.2020.9053512, 10.1109/ICASSP40776.2020.9053512]