Speaker Extraction with Detection of Presence and Absence of Target Speakers

被引:1
|
作者
Zhang, Ke [1 ,2 ]
Borsdorf, Marvin [3 ]
Pan, Zexu [2 ]
Li, Haizhou [2 ,3 ,4 ]
Wei, Yangjie [1 ]
Wang, Yi [1 ]
机构
[1] Northeastern Univ, Key Lab Intelligent Comp Med Image, Shenyang, Liaoning, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] Univ Bremen, Machine Listening Lab MLL, Bremen, Germany
[4] Chinese Univ Hong Kong, SDS, SRIBD, Shenzhen, Peoples R China
来源
INTERSPEECH 2023 | 2023年
基金
中国国家自然科学基金;
关键词
cocktail party problem; target speaker extraction; speaker detection; selective auditory attention; absent speaker; SPEECH; VERIFICATION; ATTENTION; SINGLE;
D O I
10.21437/Interspeech.2023-655
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Target speaker extraction extracts a target voice from a given cocktail party mixture signal. Most studies are restricted to conditions in which the target speaker is present in the mixture (PT), which often fail when the target speaker is absent (AT). Training on both PT and AT situations helps, but degrades the PT performance as the model intrinsically tries to detect the target presence. We propose a new model, called TSEJoint, that jointly performs target speaker detection and extraction. Both tasks share the low-level modules, allowing the detection branch to use a pre-separated signal and keeping the overall processing pipeline length similar, while at the high-level they have different branches to ensure the performance of each task. We evaluate our proposed methods under PT and AT conditions comprising one and two talkers. The TSEJoint model shows better extraction performance under the PT condition and better detection performance on all conditions compared with the baseline.
引用
收藏
页码:3714 / 3718
页数:5
相关论文
共 50 条
  • [21] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
    Li, Xiao
    Liu, Ruirui
    Huang, Huichou
    Wu, Qingyao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
  • [22] Target Active Speaker Detection with Audio-visual Cues
    Jiang, Yidi
    Tao, Ruijie
    Pan, Zexu
    Li, Haizhou
    INTERSPEECH 2023, 2023, : 3152 - 3156
  • [23] Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network
    Wang, Jian-Hong
    Lai, Yen-Ting
    Tai, Tzu-Chiang
    Le, Phuong Thi
    Pham, Tuan
    Wang, Ze-Yu
    Li, Yung-Hui
    Wang, Jia-Ching
    Chang, Pao-Chi
    Botzheim, Janos
    ELECTRONICS, 2024, 13 (02)
  • [24] Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation
    Yang, Xue
    Bao, Changchun
    Chen, Xianhong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3795 - 3810
  • [25] NeuroHeed: Neuro-Steered Speaker Extraction Using EEG Signals
    Pan, Zexu
    Borsdorf, Marvin
    Cai, Siqi
    Schultz, Tanja
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4456 - 4470
  • [26] Speakers In The Wild (SITW): The QUT Speaker Recognition System
    Ghaemmaghami, H.
    Rahman, M. H.
    Himawan, I.
    Dean, D.
    Kanagasundaram, A.
    Sridharan, S.
    Fookes, C.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 838 - 842
  • [27] TRAINING STRATEGIES FOR MODALITY DROPOUT RESILIENT MULTI-MODAL TARGET SPEAKER EXTRACTION
    Korse, Srikanth
    Elminshawi, Mohamed
    Habets, Emanuel A. P.
    Chetupalli, Srikanth Raj
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 595 - 599
  • [28] X-TF-GridNet: A time-frequency domain target speaker extraction network with adaptive speaker embedding fusion
    Hao, Fengyuan
    Li, Xiaodong
    Zheng, Chengshi
    INFORMATION FUSION, 2024, 112
  • [29] Speaker-Specific Articulatory Feature Extraction Based on Knowledge Distillation for Speaker Recognition
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Wang, Hsin-Min
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (02)
  • [30] Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker
    He, Maokui
    Raj, Desh
    Huang, Zili
    Du, Jun
    Chen, Zhuo
    Watanabe, Shinji
    INTERSPEECH 2021, 2021, : 3555 - 3559