Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

被引:1
|
作者
Wang Sijie [1 ,2 ]
Hamdulla, Askar [1 ,2 ]
Ablimit, Mijit [1 ,2 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc, Urumqi, Peoples R China
来源
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年
关键词
target speaker extraction; attention; gated fusion; multi-task learning; NETWORK;
D O I
10.1109/APSIPAASC58517.2023.10317106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of a target speaker extraction system is to extract the speech of the target speaker from a mixture of multiple speakers and noises using a certain amount of additional information of the target speaker. In this paper, we investigate the improvements of the baseline system by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer. The CBAM introduces attention enhancement to baseline model with no significant increase in the number of parameters and complexity, and the previous concatenation-based fusion method used for speaker embedding and input mixture (or intermediate output) is replaced by GFM, enabling the model to better leverage the supplementary information provided by speaker embedding. Experimental results on datasets built from WSJ0-2mix and WHAM! demonstrate that both the CBAM module and the light-weight GFM module individually improve the model performance, and the GFM module shows better improvement on WHAM!. However, the combination of these two modules only exhibits mutually beneficial effects on the clean dataset WSJ0-2mix, while the performance of the combined module on the noisy dataset WHAM! is inferior to that of using the GFM module alone.
引用
收藏
页码:1995 / 2001
页数:7
相关论文
共 50 条
  • [1] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
    Sato, Hiroshi
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    Araki, Shoko
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
  • [2] Hierarchic Temporal Convolutional Network with Attention Fusion for Target Speaker Extraction
    Chen, Zihao
    Qiu, Wenbo
    Xu, Haitao
    Hu, Ying
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 827 - 832
  • [3] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
    Li, Xiao
    Liu, Ruirui
    Huang, Huichou
    Wu, Qingyao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
  • [4] Speaker extraction network with attention mechanism for speech dialogue system
    Hao, Yun
    Wu, Jiaju
    Huang, Xiangkang
    Zhang, Zijia
    Liu, Fei
    Wu, Qingyao
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2022, 16 (02) : 111 - 119
  • [5] Speaker extraction network with attention mechanism for speech dialogue system
    Yun Hao
    Jiaju Wu
    Xiangkang Huang
    Zijia Zhang
    Fei Liu
    Qingyao Wu
    Service Oriented Computing and Applications, 2022, 16 : 111 - 119
  • [6] Target Speaker Extraction Using Attention-Enhanced Temporal Convolutional Network
    Wang, Jian-Hong
    Lai, Yen-Ting
    Tai, Tzu-Chiang
    Le, Phuong Thi
    Pham, Tuan
    Wang, Ze-Yu
    Li, Yung-Hui
    Wang, Jia-Ching
    Chang, Pao-Chi
    Botzheim, Janos
    ELECTRONICS, 2024, 13 (02)
  • [7] Gated Cross-Attention for Universal Speaker Extraction: Toward Real-World Applications
    Zhang, Yiru
    Liu, Bijing
    Yang, Yong
    Yang, Qun
    ELECTRONICS, 2024, 13 (11)
  • [8] SPEAKER-CONDITIONING SINGLE-CHANNEL TARGET SPEAKER EXTRACTION USING CONFORMER-BASED ARCHITECTURES
    Sinha, Ragini
    Tammen, Marvin
    Rollwage, Christian
    Doclo, Simon
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [9] Speaker Extraction with Detection of Presence and Absence of Target Speakers
    Zhang, Ke
    Borsdorf, Marvin
    Pan, Zexu
    Li, Haizhou
    Wei, Yangjie
    Wang, Yi
    INTERSPEECH 2023, 2023, : 3714 - 3718
  • [10] Target Speaker Extraction for Multi-Talker Speaker Verification
    Rao, Wei
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 1273 - 1277