Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引：1

作者：

Zhang, Ke ^{[1
,2
]}

Borsdorf, Marvin ^{[3
]}

Pan, Zexu ^{[2
]}

Li, Haizhou ^{[2
,3
,4
]}

Wei, Yangjie ^{[1
]}

Wang, Yi ^{[1
]}

机构：

[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany

[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;

D O I：

10.21437/Interspeech.2023-655

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.

引用

页码：3714 / 3718

页数：5

共 50 条

[31] Target speaker lipreading by audio-visual self-distillation pretraining and speaker adaptation
Zhang, Jing-Xuan
Mao, Tingzhi
Guo, Longjiang
Li, Jin
Zhang, Lichen
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
[32] Online Neural Speaker Diarization With Target Speaker Tracking
Wang, Weiqing
Li, Ming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 5078 - 5091
[33] Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information
Wang, Rui
Li, Li
Toda, Tomoki
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1968 - 1979
[34] A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Pan, Zexu
Ge, Meng
Li, Haizhou
INTERSPEECH 2022, 2022, : 1786 - 1790
[35] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
Zmolikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Higuchi, Takuya
Ogawa, Atsunori
Nakatani, Tomohiro
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
[36] Deep asymmetric extraction and aggregation for infrared small target detection
Lin, Zhongwu
Ma, Yuhao
Ming, Ruixing
Yao, Guohui
Lei, Zhuo
Zhou, Qinghui
Huang, Min
SCIENTIFIC REPORTS, 2023, 13 (01)
[37] AN AUDIO-QUALITY-BASED MULTI-STRATEGY APPROACH FOR TARGET SPEAKER EXTRACTION IN THE MISP 2023 CHALLENGE
Han, Runduo
Yang, Xiaopeng
Peng, Weiming
Guo, Pengcheng
Sun, Jiayao
Wang, He
Lu, Quan
Jiang, Ning
Xi, Lei
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 27 - 28
[38] Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities
Siatras, Spyridon
Nikolaidis, Nikos
Krinidis, Michail
Pitas, Ioannis
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2009, 19 (01) : 133 - 137
[39] Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
Zhao, Zifeng
Gu, Rongzhi
Yang, Dongchao
Tian, Jinchuan
Zou, Yuexian
INTERSPEECH 2022, 2022, : 5318 - 5322
[40] USEV: Universal Speaker Extraction With Visual Cue
Pan, Zexu
Ge, Meng
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3032 - 3045

← 1 2 3 4 5 →