Speaker extraction network with attention mechanism for speech dialogue system

被引：1

作者：

Hao, Yun ^{[1
]}

Wu, Jiaju ^{[1
]}

Huang, Xiangkang ^{[1
]}

Zhang, Zijia ^{[1
]}

Liu, Fei ^{[1
]}

Wu, Qingyao ^{[1
,2
]}

机构：

[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China

[2] Pazhou Lab, Guangzhou, Peoples R China

来源：

SERVICE ORIENTED COMPUTING AND APPLICATIONS | 2022年 / 16卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Speech dialogue system; Speech separation; Multi-task; Attention; SEPARATION; ENHANCEMENT;

D O I：

10.1007/s11761-022-00340-w

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.

引用

页码：111 / 119

页数：9

共 50 条

[41] Deep Refinement: capsule network with attention mechanism-based system for text classification
Jain, Deepak Kumar
Jain, Rachna
Upadhyay, Yash
Kathuria, Abhishek
Lan, Xiangyuan
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07) : 1839 - 1856
[42] Exploiting Relevance of Speech to Sleepiness Detection via Attention Mechanism
Tran, Bang
Zhu, Youxiang
Schwoebel, James W.
Liang, Xiaohui
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 5527 - 5532
[43] SPLIT-ATTENTION MECHANISMS WITH GRAPH CONVOLUTIONAL NETWORK FOR MULTI-CHANNEL SPEECH SEPARATION
Tan, YingWei
Ding, XueFeng
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 140 - 144
[44] OPTIMIZATION OF SPEAKER EXTRACTION NEURAL NETWORK WITH MAGNITUDE AND TEMPORAL SPECTRUM APPROXIMATION LOSS
Xu, Chenglin
Rao, Wei
Chng, Eng Siong
Li, Haizhou
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6990 - 6994
[45] Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network
Liu, Wenjing
Xie, Chuan
INTERSPEECH 2022, 2022, : 5368 - 5372
[46] Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation
Wang, Fan-Lin
Peng, Yu-Huai
Lee, Hung-Shin
Wang, Hsin-Min
INTERSPEECH 2021, 2021, : 3061 - 3065
[47] Mind wandering and the attention network system
Goncalves, Oscar F.
Rego, Gabriel
Oliveira-Silva, Patricia
Leite, Jorge
Carvalho, Sandra
Fregni, Felipe
Amaro, Edson, Jr.
Boggio, Paulo S.
ACTA PSYCHOLOGICA, 2017, 172 : 49 - 54
[48] VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
Li, Junjie
Ge, Meng
Pan, Zexu
Wang, Longbiao
Dang, Jianwu
INTERSPEECH 2022, 2022, : 906 - 910
[49] A Purely End-to-end System for Multi-speaker Speech Recognition
Seki, Hiroshi
Hori, Takaaki
Watanabe, Shinji
Le Roux, Jonathan
Hershey, John R.
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
[50] Dual-Path Hybrid Attention Network for Monaural Speech Separation
Qiu, Wenbo
Hu, Ying
IEEE ACCESS, 2022, 10 : 78754 - 78763

← 1 2 3 4 5 →