Speaker extraction network with attention mechanism for speech dialogue system

被引：1

作者：

Hao, Yun ^{[1
]}

Wu, Jiaju ^{[1
]}

Huang, Xiangkang ^{[1
]}

Zhang, Zijia ^{[1
]}

Liu, Fei ^{[1
]}

Wu, Qingyao ^{[1
,2
]}

机构：

[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China

[2] Pazhou Lab, Guangzhou, Peoples R China

来源：

SERVICE ORIENTED COMPUTING AND APPLICATIONS | 2022年 / 16卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Speech dialogue system; Speech separation; Multi-task; Attention; SEPARATION; ENHANCEMENT;

D O I：

10.1007/s11761-022-00340-w

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Speech Dialogue System is currently widely used in various fields. Users can interact and communicate with the system through natural language. While in practical situations, there exist third-person background sounds and background noise interference in real dialogue scenes. This issue seriously damages the intelligibility of the speech signal and decreases speech recognition performance. To tackle this, in this paper, we exploit a speech separation method that can help us to separate target speech from complex multi-person speech. We propose a multi-task-attention mechanism, and we select TFCN as our audio feature extraction module. Based on the multi-task method, we use SI-SDR and cross-entropy speaker classification loss function for joint training, and then we use the attention mechanism to further excludes the background vocals in the mixed speech. We not only test our result in Distortion indicators SI-SDR and SDR, but also test with a speech recognition system. To train our model and demonstrate its effectiveness, we build a background vocal removal data set based on a common data set. Experimental results empirically show that our model significantly improves the performance of speech separation model.

引用

页码：111 / 119

页数：9

共 50 条

[21] Speech Enhancement for Multimodal Speaker Diarization System
Ahmad, Rehan
Zubair, Syed
Alquhayz, Hani
IEEE ACCESS, 2020, 8 : 126671 - 126680
[22] SEF-Net: Speaker Embedding Free Target Speaker Extraction Network
Zeng, Bang
Suo, Hongbin
Wan, Yulong
Li, Ming
INTERSPEECH 2023, 2023, : 3452 - 3456
[23] Gated Cross-Attention for Universal Speaker Extraction: Toward Real-World Applications
Zhang, Yiru
Liu, Bijing
Yang, Yong
Yang, Qun
ELECTRONICS, 2024, 13 (11)
[24] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
Li, Xiao
Liu, Ruirui
Huang, Huichou
Wu, Qingyao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
[25] A Pitch-aware Speaker Extraction Serial Network
Jiang, Yu
Ge, Meng
Wang, Longbiao
Dang, Jianwu
Honda, Kiyoshi
Zhang, Sulin
Yu, Bo
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 616 - 620
[26] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Meng, Zhong
Gaur, Yashesh
Li, Jinyu
Gong, Yifan
INTERSPEECH 2019, 2019, : 241 - 245
[27] SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Zorila, Catalin
Doddipatla, Rama
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6297 - 6301
[28] Speaker-independent auditory attention decoding without access to clean speech sources
Han, Cong
O'Sullivan, James
Luo, Yi
Herrero, Jose
Mehta, Ashesh D.
Mesgarani, Nima
SCIENCE ADVANCES, 2019, 5 (05)
[29] Speaker Attractor Network: Generalizing Speech Separation to Unseen Numbers of Sources
Jiang, Fei
Duan, Zhiyao
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1859 - 1863
[30] IMPROVING SPEAKER DISCRIMINATION OF TARGET SPEECH EXTRACTION WITH TIME-DOMAIN SPEAKERBEAM
Delcroix, Marc
Ochiai, Tsubasa
Zmolikova, Katerina
Kinoshita, Keisuke
Tawara, Naohiro
Nakatani, Tomohiro
Araki, Shoko
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 691 - 695

← 1 2 3 4 5 →