Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引:1
|
作者
Zhang, Ke [1 ,2 ]
Borsdorf, Marvin [3 ]
Pan, Zexu [2 ]
Li, Haizhou [2 ,3 ,4 ]
Wei, Yangjie [1 ]
Wang, Yi [1 ]
机构
[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany
[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China
来源
INTERSPEECH 2023 | 2023年
基金
中国国家自然科学基金;
关键词
cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;
D O I
10.21437/Interspeech.2023-655
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.
引用
收藏
页码:3714 / 3718
页数:5
相关论文
共 50 条
  • [41] NEUROSPEX: NEURO-GUIDED SPEAKER EXTRACTION WITH CROSS-MODAL FUSION
    De Silva, Dashanka
    Cai, Siqi
    Pahuja, Saurav
    Schultz, Tanja
    Li, Haizhou
    2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2024, : 341 - 348
  • [42] DIRECTIONAL TARGET SPEAKER EXTRACTION UNDER NOISY UNDERDETERMINED CONDITIONS THROUGH CONDITIONAL VARIATIONAL AUTOENCODER WITH GLOBAL STYLE TOKENS
    Wang, Rui
    Toda, Tomoki
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [43] wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech
    Borsdorf, Marvin
    Pan, Zexu
    Li, Haizhou
    Schultz, Tanja
    INTERSPEECH 2024, 2024, : 5038 - 5042
  • [44] A new architecture based VAD for speaker diarization/detection systems
    Kenai, Ouassila
    Ouamour, Siham
    Guerti, Mhania
    Asbai, Nassim
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 827 - 840
  • [45] A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments
    Hao, Yunzhe
    Xu, Jiaming
    Shi, Jing
    Zhang, Peng
    Qin, Lei
    Xu, Bo
    INTERSPEECH 2020, 2020, : 1431 - 1435
  • [46] Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition
    Ferras, Marc
    Leung, Cheung-Chi
    Barras, Claude
    Gauvain, Jean-Luc
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1366 - 1378
  • [47] A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement
    Ravi, Vijay
    Wang, Jinhan
    Flint, Jonathan
    Alwan, Abeer
    INTERSPEECH 2022, 2022, : 3338 - 3342
  • [48] UBM based speaker segmentation and clustering for 2-speaker detection
    Deng, Jing
    Zheng, Thomas Fang
    Wu, Wenhu
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 116 - +
  • [49] TARGET LANGUAGE EXTRACTION AT MULTILINGUAL COCKTAIL PARTIES
    Borsdorf, Marvin
    Li, Haizhou
    Schultz, Tanja
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 717 - 724
  • [50] AvaTr: One-Shot Speaker Extraction with Transformers
    Hu, Shell Xu
    Arefin, Md Rifat
    Viet-Nhat Nguyen
    Dipani, Alish
    Pitkow, Xaq
    Tolias, Andreas Savas
    INTERSPEECH 2021, 2021, : 3510 - 3514