DUAL-BRANCH ATTENTION-IN-ATTENTION TRANSFORMER FOR SINGLE-CHANNEL SPEECH ENHANCEMENT

被引:51
|
作者
Yu, Guochen [1 ,2 ]
Li, Andong [2 ]
Zheng, Chengshi [2 ]
Guo, Yinuo [3 ]
Wang, Yutian [1 ]
Wang, Hui [1 ]
机构
[1] Commun Univ China, State Key Lab Media Convergence & Commun, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Acoust, Beijing, Peoples R China
[3] Bytedance, Beijing, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Speech enhancement; dual-branch; attention-in-attention; transformer; NETWORKS;
D O I
10.1109/ICASSP43922.2022.9746273
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to coarsely estimate the overall magnitude spectrum, and simultaneously a complex refining branch is elaborately designed to compensate for the missing spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional networks for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, aiming to capture long-term temporal-frequency dependencies and further aggregate global hierarchical contextual information. Experimental results on Voice Bank + DEMAND demonstrate that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6% STOI and 10.79dB SSNR) over previous advanced systems with a relatively small model size (2.81M).
引用
收藏
页码:7847 / 7851
页数:5
相关论文
共 50 条
  • [1] DBT-Net: Dual-Branch Federative Magnitude and Phase Estimation With Attention-in-Attention Transformer for Monaural Speech Enhancement
    Yu, Guochen
    Li, Andong
    Wang, Hui
    Wang, Yutian
    Ke, Yuxuan
    Zheng, Chengshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2629 - 2644
  • [2] DBNet: A Dual-branch Network Architecture Processing on Spectrum and Waveform for Single-channel Speech Enhancement
    Zhang, Kanghao
    He, Shulin
    Li, Hao
    Zhang, Xueliang
    INTERSPEECH 2021, 2021, : 2821 - 2825
  • [3] A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement
    Zhang, Kanghao
    He, Shulin
    Li, Hao
    Zhang, Xueliang
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (03)
  • [4] Dual-branch channel attention enhancement feature fusion network for diabetic retinopathy segmentation
    Ma, Lei
    Liu, Ziqian
    Xu, Qihang
    Hong, Hanyu
    Wang, Lei
    Zhu, Ying
    Shi, Yu
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 106
  • [5] A dual-branch and dual attention transformer and CNN hybrid network for ultrasound image segmentation
    Zhang, Chong
    Wang, Lingtong
    Wei, Guohui
    Kong, Zhiyong
    Qiu, Min
    FRONTIERS IN PHYSIOLOGY, 2024, 15
  • [6] Single-Channel Speech Separation Focusing on Attention DE
    Li, Xinshu
    Tan, Zhenhua
    Xia, Zhenche
    Wu, Danke
    Zhang, Bin
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3204 - 3209
  • [7] Inverseformer: Dual-Branch Network With Attention Enhancement for Density Interface Inversion
    Yu, Ping
    Zhou, Long-Ran
    Zhao, Xiao
    Lu, Peng-Yu
    Huang, Guan-Lin
    Jiao, Jian
    Bi, Feng-Yi
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [8] Dual-branch adaptive attention transformer for occluded person re-identification
    Lu, Yunhua
    Jiang, Mingzi
    Liu, Zhi
    Mu, Xinyu
    IMAGE AND VISION COMPUTING, 2023, 131
  • [9] Real-time single-channel speech enhancement based on causal attention mechanism
    Fan, Junyi
    Yang, Jibin
    Zhang, Xiongwei
    Yao, Yao
    APPLIED ACOUSTICS, 2022, 201
  • [10] Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement
    Yecchuri, Sivaramakrishna
    Vanambathina, Sunny Dayal
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)