Text-aware Speech Separation for Multi-talker Keyword Spotting

被引:0
|
作者
Li, Haoyu [1 ]
Yang, Baochen [1 ]
Xi, Yu [1 ]
Yu, Linfeng [1 ]
Tan, Tian [1 ]
Li, Hao [2 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, AI Inst, X LANCE Lab, Shanghai, Peoples R China
[2] AISpeech Ltd, Beijing, Peoples R China
来源
关键词
multi-talker keyword spotting; text-aware speech separation; robustness;
D O I
10.21437/Interspeech.2024-789
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To address it, this paper proposes a novel Text-aware Permutation Determinization Training method for multi-talker KWS with a clue-based Speech Separation front-end (TPDT-SS). Our research highlights the critical role of SS front-ends and shows that incorporating keyword-specific clues into these models can greatly enhance the effectiveness. TPDT-SS shows remarkable success in addressing permutation problems in mixed keyword speech, thereby greatly boosting the performance of the backend. Additionally, fine-tuning our system on unseen mixed speech results in further performance improvement.
引用
收藏
页码:337 / 341
页数:5
相关论文
共 50 条
  • [21] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
    Tripathi, Anshuman
    Lu, Han
    Sak, Hasim
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
  • [22] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
  • [23] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
  • [24] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
    O'Sullivan, James
    Herrero, Jose
    Smith, Elliot
    Schevon, Catherine
    McKhann, Guy M.
    Sheth, Sameer A.
    Mehta, Ashesh D.
    Mesgarani, Nima
    NEURON, 2019, 104 (06) : 1195 - +
  • [25] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
    Frissen, Ilja
    Scherzer, Johannes
    Yao, Hsin-Yun
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
  • [26] Spatial Separation Benefit for Speech Detection in Multi-Talker Babble-Noise with Different Egocentric Distances
    Andreeva, I. G.
    Dymnikowa, M.
    Gvozdeva, A. P.
    Ogorodnikova, E. A.
    Pak, S. P.
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (03) : 484 - 491
  • [27] Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
    Wang, Jing
    Luo, Yiyu
    Yi, Weiming
    Xie, Xiang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 766 - 777
  • [28] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
    Westermann, Adam
    Buchholz, Joerg M.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
  • [29] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 304 - 308
  • [30] Chinese speech identification in multi-talker babble with diotic and dichotic listening
    PENG JianXin 1
    2 Department of Architecture
    Chinese Science Bulletin, 2012, 57 (20) : 2561 - 2566