DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT

被引:50
作者
Dang, Feng [1 ,2 ,3 ]
Chen, Hangting [1 ]
Zhangt, Pengyuan [1 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Inst Acoust, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
speech enhancement; frequency domain; dual-path transformer; full-band and sub-band fusion;
D O I
10.1109/ICASSP43922.2022.9746171
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sub-band models have achieved promising results due to their ability to model local patterns in the spectrogram. Some studies further improve the performance by fusing sub-band and full-band information. However, the structure for the full-band and sub-band fusion model was not fully explored. This paper proposes a dual-path transformer-based full-band and sub-band fusion network (DPT-FSNet) for speech enhancement in the frequency domain. The intra and inter parts of the dual-path transformer model sub-band and full-band information, respectively. The features utilized by our proposed method are more interpretable than those utilized by the time-domain dual-path transformer. We conducted experiments on the Voice Bank + DEMAND and Interspeech 2020 Deep Noise Suppression (DNS) datasets to evaluate the proposed method. Experimental results show that the proposed method outperforms the current state-of-the-art.
引用
收藏
页码:6857 / 6861
页数:5
相关论文
共 34 条
  • [1] Single-channel speech enhancement using learnable loss mixup
    Chang, Oscar
    Tran, Dung N.
    Koishida, Kazuhito
    [J]. INTERSPEECH 2021, 2021, : 2696 - 2700
  • [2] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
    Chen, Jingjing
    Mao, Qirong
    Liu, Dong
    [J]. INTERSPEECH 2020, 2020, : 2642 - 2646
  • [3] Real Time Speech Enhancement in the Waveform Domain
    Defossez, Alexandre
    Synnaeve, Gabriel
    Adi, Yossi
    [J]. INTERSPEECH 2020, 2020, : 3291 - 3295
  • [4] Fu SW, 2019, PR MACH LEARN RES, V97
  • [5] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT
    Hao, Xiang
    Su, Xiangdong
    Horaud, Radu
    Li, Xiaofei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6633 - 6637
  • [6] Evaluation of objective quality measures for speech enhancement
    Hu, Yi
    Loizou, Philipos C.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
  • [7] Densely Connected Convolutional Networks
    Huang, Gao
    Liu, Zhuang
    van der Maaten, Laurens
    Weinberger, Kilian Q.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2261 - 2269
  • [8] PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss
    Isik, Umut
    Giri, Ritwik
    Phansalkar, Neerad
    Valin, Jean-Marc
    Helwani, Karim
    Krishnaswamy, Arvindh
    [J]. INTERSPEECH 2020, 2020, : 2487 - 2491
  • [9] ITUT Rec, 2005, P 862 2 WID EXT REC
  • [10] SE-Conformer: Time-Domain Speech Enhancement using Conformer
    Kim, Eesung
    Seo, Hyeji
    [J]. INTERSPEECH 2021, 2021, : 2736 - 2740