Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引：1

作者：

Zhang, Ke ^{[1
,2
]}

Borsdorf, Marvin ^{[3
]}

Pan, Zexu ^{[2
]}

Li, Haizhou ^{[2
,3
,4
]}

Wei, Yangjie ^{[1
]}

Wang, Yi ^{[1
]}

机构：

[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany

[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;

D O I：

10.21437/Interspeech.2023-655

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.

引用

页码：3714 / 3718

页数：5

共 50 条

[41] NEUROSPEX: NEURO-GUIDED SPEAKER EXTRACTION WITH CROSS-MODAL FUSION
De Silva, Dashanka
Cai, Siqi
Pahuja, Saurav
Schultz, Tanja
Li, Haizhou
2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 341 - 348
[42] DIRECTIONAL TARGET SPEAKER EXTRACTION UNDER NOISY UNDERDETERMINED CONDITIONS THROUGH CONDITIONAL VARIATIONAL AUTOENCODER WITH GLOBAL STYLE TOKENS
Wang, Rui
Toda, Tomoki
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[43] wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech
Borsdorf, Marvin
Pan, Zexu
Li, Haizhou
Schultz, Tanja
INTERSPEECH 2024, 2024, : 5038 - 5042
[44] A new architecture based VAD for speaker diarization/detection systems
Kenai, Ouassila
Ouamour, Siham
Guerti, Mhania
Asbai, Nassim
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 827 - 840
[45] A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments
Hao, Yunzhe
Xu, Jiaming
Shi, Jing
Zhang, Peng
Qin, Lei
Xu, Bo
INTERSPEECH 2020, 2020, : 1431 - 1435
[46] Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition
Ferras, Marc
Leung, Cheung-Chi
Barras, Claude
Gauvain, Jean-Luc
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1366 - 1378
[47] A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement
Ravi, Vijay
Wang, Jinhan
Flint, Jonathan
Alwan, Abeer
INTERSPEECH 2022, 2022, : 3338 - 3342
[48] UBM based speaker segmentation and clustering for 2-speaker detection
Deng, Jing
Zheng, Thomas Fang
Wu, Wenhu
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 116 - +
[49] TARGET LANGUAGE EXTRACTION AT MULTILINGUAL COCKTAIL PARTIES
Borsdorf, Marvin
Li, Haizhou
Schultz, Tanja
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 717 - 724
[50] AvaTr: One-Shot Speaker Extraction with Transformers
Hu, Shell Xu
Arefin, Md Rifat
Viet-Nhat Nguyen
Dipani, Alish
Pitkow, Xaq
Tolias, Andreas Savas
INTERSPEECH 2021, 2021, : 3510 - 3514

← 1 2 3 4 5 →