Denoi-SpEx plus : A Speaker Extraction Network based Speech Dialogue System

被引:0
作者
Hao, Yun [1 ,2 ]
Huang, Xiangkang [1 ,2 ]
Huang, Huichou [4 ]
Wu, Qingyao [1 ,3 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] Minist Educ, Key Lab Big Data & Intelligent Robot, Guangzhou, Peoples R China
[3] Pazhou Lab, Guangzhou, Peoples R China
[4] City Univ Hong Kong, Hong Kong, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2021) | 2021年
基金
中国国家自然科学基金;
关键词
speech dialogue system; speech separation; Denoi-SpEx; SEPARATION; ENHANCEMENT;
D O I
10.1109/ICEBE52470.2021.00030
中图分类号
F [经济];
学科分类号
02 ;
摘要
The speech dialogue system has gradually been widely used in daily life. Users can consult and communicate with the system through natural language. However, in practical applications, third-person background sounds and background noise interference in real dialogue scenes will be encountered. The uncertainty and complexity of these background sounds will have a bad impact on the recognition of the system. A good speech enhancement module can help us to separate the target speaker from the original speech. Recently, a solution called SpEx+ was proposed from the time domain, but SpEx+ needs a reference speech to assist in training. This reference speech may have noise in actual applications that will affect performance. Therefore, we propose a Denoi-SpEx+ model. Before the reference speech is input to the network, a speech denoising network is added, so that the quality of speech separation in practical applications can be guaranteed. Experiments show that our model can significantly improve the performance of speech separation model of noisy reference speech.
引用
收藏
页码:49 / 53
页数:5
相关论文
共 29 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[3]  
Defossez A., 2020, arXiv
[4]  
Fan CH, 2018, 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), P26, DOI 10.1109/ISCSLP.2018.8706611
[5]  
Ge M, 2020, Arxiv, DOI arXiv:2005.04686
[6]   Phase Processing for Single-Channel Speech Enhancement [J].
Gerkmann, Timo ;
Krawczyk-Becker, Martin ;
Le Roux, Jonathan .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) :55-66
[7]   FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT [J].
Hao, Xiang ;
Su, Xiangdong ;
Horaud, Radu ;
Li, Xiaofei .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6633-6637
[8]  
Hershey JR, 2016, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2016.7471631
[9]  
Huang Chao, 1999, 6 EUROPEAN C SPEECH
[10]   Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks [J].
Kolbaek, Morten ;
Yu, Dong ;
Tan, Zheng-Hua ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) :1901-1913