Speaker extraction network with attention mechanism for speech dialogue system

被引:1
|
作者
Hao, Yun [1 ]
Wu, Jiaju [1 ]
Huang, Xiangkang [1 ]
Zhang, Zijia [1 ]
Liu, Fei [1 ]
Wu, Qingyao [1 ,2 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech dialogue system; Speech separation; Multi-task; Attention; SEPARATION; ENHANCEMENT;
D O I
10.1007/s11761-022-00340-w
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.
引用
收藏
页码:111 / 119
页数:9
相关论文
共 50 条
  • [31] Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Yi, Chen
    An, Zeliang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4647 - 4660
  • [32] Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding
    Kim, Minsoo
    Jang, Gil-Jin
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [33] Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech
    Xu, Chenglin
    Rao, Wei
    Wu, Jibin
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2696 - 2709
  • [34] ATTENTION-BASED SCALING ADAPTATION FOR TARGET SPEECH EXTRACTION
    Han, Jiangyu
    Rao, Wei
    Long, Yanhua
    Liang, Jiaen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 658 - 662
  • [35] CONTINUOUS SPEECH SEPARATION WITH RECURRENT SELECTIVE ATTENTION NETWORK
    Zhang, Yixuan
    Chen, Zhuo
    Wu, Jian
    Yoshioka, Takuya
    Wang, Peidong
    Meng, Zhong
    Li, Jinyu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6017 - 6021
  • [36] wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech
    Borsdorf, Marvin
    Pan, Zexu
    Li, Haizhou
    Schultz, Tanja
    INTERSPEECH 2024, 2024, : 5038 - 5042
  • [37] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
    Han, Jiangyu
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098
  • [38] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [39] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
    Li, Chenda
    Qian, Yanmin
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
  • [40] Deep Refinement: capsule network with attention mechanism-based system for text classification
    Deepak Kumar Jain
    Rachna Jain
    Yash Upadhyay
    Abhishek Kathuria
    Xiangyuan Lan
    Neural Computing and Applications, 2020, 32 : 1839 - 1856