Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引：1

作者：

Zhang, Ke ^{[1
,2
]}

Borsdorf, Marvin ^{[3
]}

Pan, Zexu ^{[2
]}

Li, Haizhou ^{[2
,3
,4
]}

Wei, Yangjie ^{[1
]}

Wang, Yi ^{[1
]}

机构：

[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany

[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;

D O I：

10.21437/Interspeech.2023-655

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.

引用

页码：3714 / 3718

页数：5

共 50 条

[1] Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers
Borsdorf, Marvin
Xu, Chenglin
Li, Haizhou
Schultz, Tanja
INTERSPEECH 2021, 2021, : 1469 - 1473
[2] Binaural Selective Attention Model for Target Speaker Extraction
Meng, Hanyu
Zhang, Qiquan
Zhang, Xiangyu
Sethu, Vidhyasaharan
Ambikairajah, Eliathamby
INTERSPEECH 2024, 2024, : 4323 - 4327
[3] Target Speaker Extraction for Multi-Talker Speaker Verification
Rao, Wei
Xu, Chenglin
Chng, Eng Siong
Li, Haizhou
INTERSPEECH 2019, 2019, : 1273 - 1277
[4] Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism
Wang Sijie
Hamdulla, Askar
Ablimit, Mijit
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1995 - 2001
[5] Focus the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
Lin, Jiuxin
Wang, Peng
Dinkel, Heinrich
Chen, Jun
Wu, Zhiyong
Wang, Yongqing
Yan, Zhiyong
Zhang, Junbo
Wang, Yujun
INTERSPEECH 2023, 2023, : 2488 - 2492
[6] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
Sato, Hiroshi
Ochiai, Tsubasa
Kinoshita, Keisuke
Delcroix, Marc
Nakatani, Tomohiro
Araki, Shoko
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
[7] Target Speaker Extraction by Fusing Voiceprint Features
Cheng, Shidan
Shen, Ying
Wang, Dongqing
APPLIED SCIENCES-BASEL, 2022, 12 (16):
[8] MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
Pan, Zexu
Tao, Ruijie
Xu, Chenglin
Li, Haizhou
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6678 - 6682
[9] WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Wang, Shuai
Zhang, Ke
Lin, Shaoxiong
Li, Junjie
Wang, Xuefei
Ge, Meng
Yu, Jianwei
Qian, Yanmin
Li, Haizhou
INTERSPEECH 2024, 2024, : 4273 - 4277
[10] SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES
Sinha, Ragini
Tammen, Marvin
Rollwage, Christian
Doclo, Simon
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,

← 1 2 3 4 5 →