Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

被引:1
|
作者
Wang Sijie [1 ,2 ]
Hamdulla, Askar [1 ,2 ]
Ablimit, Mijit [1 ,2 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Key Lab Signal Detect & Proc, Urumqi, Peoples R China
来源
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年
关键词
target speaker extraction; attention; gated fusion; multi-task learning; NETWORK;
D O I
10.1109/APSIPAASC58517.2023.10317106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of a target speaker extraction system is to extract the speech of the target speaker from a mixture of multiple speakers and noises using a certain amount of additional information of the target speaker. In this paper, we investigate the improvements of the baseline system by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer. The CBAM introduces attention enhancement to baseline model with no significant increase in the number of parameters and complexity, and the previous concatenation-based fusion method used for speaker embedding and input mixture (or intermediate output) is replaced by GFM, enabling the model to better leverage the supplementary information provided by speaker embedding. Experimental results on datasets built from WSJ0-2mix and WHAM! demonstrate that both the CBAM module and the light-weight GFM module individually improve the model performance, and the GFM module shows better improvement on WHAM!. However, the combination of these two modules only exhibits mutually beneficial effects on the clean dataset WSJ0-2mix, while the performance of the combined module on the noisy dataset WHAM! is inferior to that of using the GFM module alone.
引用
收藏
页码:1995 / 2001
页数:7
相关论文
共 50 条
  • [31] MEEAFusion: Multi-Scale Edge Enhancement and Joint Attention Mechanism Based Infrared and Visible Image Fusion
    Xie, Yingjiang
    Fei, Zhennan
    Deng, Da
    Meng, Lingshuai
    Niu, Fu
    Sun, Jinggong
    SENSORS, 2024, 24 (17)
  • [32] Infrared and Visible Image Fusion with Significant Target Enhancement
    Huo, Xing
    Deng, Yinping
    Shao, Kun
    ENTROPY, 2022, 24 (11)
  • [33] RETRACTED: A Multichannel Model for Microbial Key Event Extraction Based on Feature Fusion and Attention Mechanism (Retracted Article)
    Li, Peng
    Wang, Qian
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [34] A multi-scale feature extraction and fusion method for bearing fault diagnosis based on hybrid attention mechanism
    Meng, Huan
    Zhang, Jiakai
    Zhao, Jingbo
    Wang, Daichao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (SUPPL 1) : 31 - 41
  • [35] Attend and Rectify: A Gated Attention Mechanism for Fine-Grained Recovery
    Rodriguez, Pau
    Gonfaus, Josep M.
    Cucurull, Guillem
    Xavier Roca, F.
    Gonzalez, Jordi
    COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 357 - 372
  • [36] FUSION TARGET ATTENTION MASK GENERATION NETWORK FOR VIDEO SEGMENTATION
    Li, Yunyi
    Chen, Fangping
    Yang, Fan
    Li, Yuan
    Jia, Huizhu
    Xie, Xiaodong
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2276 - 2280
  • [37] IMPROVING SPEAKER DISCRIMINATION OF TARGET SPEECH EXTRACTION WITH TIME-DOMAIN SPEAKERBEAM
    Delcroix, Marc
    Ochiai, Tsubasa
    Zmolikova, Katerina
    Kinoshita, Keisuke
    Tawara, Naohiro
    Nakatani, Tomohiro
    Araki, Shoko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 691 - 695
  • [38] Underwater target detection with an attention mechanism and improved scale
    Xiangyu Wei
    Long Yu
    Shengwei Tian
    Pengcheng Feng
    Xin Ning
    Multimedia Tools and Applications, 2021, 80 : 33747 - 33761
  • [39] ATTENTION-BASED SCALING ADAPTATION FOR TARGET SPEECH EXTRACTION
    Han, Jiangyu
    Rao, Wei
    Long, Yanhua
    Liang, Jiaen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 658 - 662
  • [40] Underwater target detection with an attention mechanism and improved scale
    Wei, Xiangyu
    Yu, Long
    Tian, Shengwei
    Feng, Pengcheng
    Ning, Xin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (25) : 33747 - 33761