wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech

被引：0

作者：

Borsdorf, Marvin ^{[1
]}

Pan, Zexu ^{[2
]}

Li, Haizhou ^{[1
,3
,4
]}

Schultz, Tanja ^{[5
]}

机构：

[1] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany

[2] Alibaba Grp, Singapore, Singapore

[3] Chinese Univ Hong Kong, SRIBD, SDS, Shenzhen, Peoples R China

[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[5] Univ Bremen, Cognit Syst Lab CSL, Bremen, Germany

来源：

INTERSPEECH 2024 | 2024年

关键词：

Speaker extraction; speech separation; cocktail party problem; speech mode; whispered speech; SEPARATION; RECOGNITION;

D O I：

10.21437/Interspeech.2024-1172

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Target speaker extraction (TSE) seeks to single out a target speaker's voice from a given speech mixture signal with the help of a target reference signal. This algorithm enables novel speech applications such as smart hearing aids. A TSE system has to work reliably in any everyday conversational situation. This may also include speakers who switch naturally between normal and whispered speech modes. This work represents the first attempt to perform TSE for whispered speech. For this, we construct a new first of its kind database, called wTIMIT2mix, which comprises two-speaker speech mixtures and target speaker reference signals given in both normal and whispered speech modes. Our results on TSE show that if these conditions are included in the training, a model can be equipped to work under all closed-set conditions.

引用

页码：5038 / 5042

页数：5

共 40 条

[1] Practical applicability of deep neural networks for overlapping speaker separation
Appeltans, Pieter
Zegers, Jeroen
Van Hamme, Hugo
[J]. INTERSPEECH 2019, 2019, : 1353 - 1357
[2] Borsdorf M., 2021, INTERSPEECH
[3] Borsdorf M., 2021, ASRU
[4] Bronkhorst AW, 2000, ACUSTICA, V86, P117
[5] Chang H.-J., 2021, SLT
[6] Chen Z, 2020, INT CONF ACOUST SPEE, P7284, DOI [10.1109/ICASSP40776.2020.9053426, 10.1109/icassp40776.2020.9053426]
[7] SOME EXPERIMENTS ON THE RECOGNITION OF SPEECH, WITH ONE AND WITH 2 EARS
CHERRY, EC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (05) : 975 - 979
[8] Cosentino J, 2020, Arxiv, DOI arXiv:2005.11262
[9] Cummins F., 2006, SPECOM
[10] Delcroix M, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5554, DOI 10.1109/ICASSP.2018.8462661

← 1 2 3 4 →