Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引:1
|
作者
Zhang, Ke [1 ,2 ]
Borsdorf, Marvin [3 ]
Pan, Zexu [2 ]
Li, Haizhou [2 ,3 ,4 ]
Wei, Yangjie [1 ]
Wang, Yi [1 ]
机构
[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany
[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China
来源
INTERSPEECH 2023 | 2023年
基金
中国国家自然科学基金;
关键词
cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;
D O I
10.21437/Interspeech.2023-655
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.
引用
收藏
页码:3714 / 3718
页数:5
相关论文
共 50 条
  • [1] Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers
    Borsdorf, Marvin
    Xu, Chenglin
    Li, Haizhou
    Schultz, Tanja
    INTERSPEECH 2021, 2021, : 1469 - 1473
  • [2] Binaural Selective Attention Model for Target Speaker Extraction
    Meng, Hanyu
    Zhang, Qiquan
    Zhang, Xiangyu
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    INTERSPEECH 2024, 2024, : 4323 - 4327
  • [3] Target Speaker Extraction for Multi-Talker Speaker Verification
    Rao, Wei
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 1273 - 1277
  • [4] Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism
    Wang Sijie
    Hamdulla, Askar
    Ablimit, Mijit
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1995 - 2001
  • [5] Focus the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
    Lin, Jiuxin
    Wang, Peng
    Dinkel, Heinrich
    Chen, Jun
    Wu, Zhiyong
    Wang, Yongqing
    Yan, Zhiyong
    Zhang, Junbo
    Wang, Yujun
    INTERSPEECH 2023, 2023, : 2488 - 2492
  • [6] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
    Sato, Hiroshi
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    Araki, Shoko
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
  • [7] Target Speaker Extraction by Fusing Voiceprint Features
    Cheng, Shidan
    Shen, Ying
    Wang, Dongqing
    APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [8] MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
    Pan, Zexu
    Tao, Ruijie
    Xu, Chenglin
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6678 - 6682
  • [9] WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
    Wang, Shuai
    Zhang, Ke
    Lin, Shaoxiong
    Li, Junjie
    Wang, Xuefei
    Ge, Meng
    Yu, Jianwei
    Qian, Yanmin
    Li, Haizhou
    INTERSPEECH 2024, 2024, : 4273 - 4277
  • [10] SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES
    Sinha, Ragini
    Tammen, Marvin
    Rollwage, Christian
    Doclo, Simon
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,