Speaker extraction network with attention mechanism for speech dialogue system

被引:1
|
作者
Hao, Yun [1 ]
Wu, Jiaju [1 ]
Huang, Xiangkang [1 ]
Zhang, Zijia [1 ]
Liu, Fei [1 ]
Wu, Qingyao [1 ,2 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech dialogue system; Speech separation; Multi-task; Attention; SEPARATION; ENHANCEMENT;
D O I
10.1007/s11761-022-00340-w
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.
引用
收藏
页码:111 / 119
页数:9
相关论文
共 50 条
  • [41] Deep Refinement: capsule network with attention mechanism-based system for text classification
    Jain, Deepak Kumar
    Jain, Rachna
    Upadhyay, Yash
    Kathuria, Abhishek
    Lan, Xiangyuan
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07) : 1839 - 1856
  • [42] Exploiting Relevance of Speech to Sleepiness Detection via Attention Mechanism
    Tran, Bang
    Zhu, Youxiang
    Schwoebel, James W.
    Liang, Xiaohui
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 5527 - 5532
  • [43] SPLIT-ATTENTION MECHANISMS WITH GRAPH CONVOLUTIONAL NETWORK FOR MULTI-CHANNEL SPEECH SEPARATION
    Tan, YingWei
    Ding, XueFeng
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 140 - 144
  • [44] OPTIMIZATION OF SPEAKER EXTRACTION NEURAL NETWORK WITH MAGNITUDE AND TEMPORAL SPECTRUM APPROXIMATION LOSS
    Xu, Chenglin
    Rao, Wei
    Chng, Eng Siong
    Li, Haizhou
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6990 - 6994
  • [45] Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network
    Liu, Wenjing
    Xie, Chuan
    INTERSPEECH 2022, 2022, : 5368 - 5372
  • [46] Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation
    Wang, Fan-Lin
    Peng, Yu-Huai
    Lee, Hung-Shin
    Wang, Hsin-Min
    INTERSPEECH 2021, 2021, : 3061 - 3065
  • [47] Mind wandering and the attention network system
    Goncalves, Oscar F.
    Rego, Gabriel
    Oliveira-Silva, Patricia
    Leite, Jorge
    Carvalho, Sandra
    Fregni, Felipe
    Amaro, Edson, Jr.
    Boggio, Paulo S.
    ACTA PSYCHOLOGICA, 2017, 172 : 49 - 54
  • [48] VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
    Li, Junjie
    Ge, Meng
    Pan, Zexu
    Wang, Longbiao
    Dang, Jianwu
    INTERSPEECH 2022, 2022, : 906 - 910
  • [49] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [50] Dual-Path Hybrid Attention Network for Monaural Speech Separation
    Qiu, Wenbo
    Hu, Ying
    IEEE ACCESS, 2022, 10 : 78754 - 78763