Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引：1

作者：

Zhang, Ke ^{[1
,2
]}

Borsdorf, Marvin ^{[3
]}

Pan, Zexu ^{[2
]}

Li, Haizhou ^{[2
,3
,4
]}

Wei, Yangjie ^{[1
]}

Wang, Yi ^{[1
]}

机构：

[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany

[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;

D O I：

10.21437/Interspeech.2023-655

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.

引用

页码：3714 / 3718

页数：5

共 50 条

[21] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
Li, Xiao
Liu, Ruirui
Huang, Huichou
Wu, Qingyao
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
[22] Target Active Speaker Detection with Audio-visual Cues
Jiang, Yidi
Tao, Ruijie
Pan, Zexu
Li, Haizhou
INTERSPEECH 2023, 2023, : 3152 - 3156
[23] Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network
Wang, Jian-Hong
Lai, Yen-Ting
Tai, Tzu-Chiang
Le, Phuong Thi
Pham, Tuan
Wang, Ze-Yu
Li, Yung-Hui
Wang, Jia-Ching
Chang, Pao-Chi
Botzheim, Janos
ELECTRONICS, 2024, 13 (02)
[24] Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation
Yang, Xue
Bao, Changchun
Chen, Xianhong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3795 - 3810
[25] NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals
Pan, Zexu
Borsdorf, Marvin
Cai, Siqi
Schultz, Tanja
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4456 - 4470
[26] Speakers In The Wild (SITW): The QUT Speaker Recognition System
Ghaemmaghami, H.
Rahman, M. H.
Himawan, I.
Dean, D.
Kanagasundaram, A.
Sridharan, S.
Fookes, C.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 838 - 842
[27] TRAINING STRATEGIES FOR MODALITY DROPOUT RESILIENT MULTI-MODAL TARGET SPEAKER EXTRACTION
Korse, Srikanth
Elminshawi, Mohamed
Habets, Emanuel A. P.
Chetupalli, Srikanth Raj
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 595 - 599
[28] X-TF-GridNet: A time-frequency domain target speaker extraction network with adaptive speaker embedding fusion
Hao, Fengyuan
Li, Xiaodong
Zheng, Chengshi
INFORMATION FUSION, 2024, 112
[29] Speaker-Specific Articulatory Feature Extraction Based on Knowledge Distillation for Speaker Recognition
Hong, Qian-Bei
Wu, Chung-Hsien
Wang, Hsin-Min
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (02)
[30] Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
He, Maokui
Raj, Desh
Huang, Zili
Du, Jun
Chen, Zhuo
Watanabe, Shinji
INTERSPEECH 2021, 2021, : 3555 - 3559

← 1 2 3 4 5 →