FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT

被引:60
作者
Chen, Jun [1 ,2 ]
Wang, Zilin [1 ]
Tuo, Deyi [2 ]
Wu, Zhiyong [1 ,3 ]
Kang, Shiyin [2 ]
Meng, Helen [1 ,3 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Huya Inc, Guangzhou, Guangdong, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
speech enhancement; multi-scale time sensitive channel attention; phase information; full-band extractor;
D O I
10.1109/ICASSP43922.2022.9747888
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention. However, it still encounters issues such as input-output mismatch and coarse processing for frequency bands. In this paper, we propose an extended single-channel real-time speech enhancement framework called FullSubNet+ with following significant improvements. First, we design a lightweight multi-scale time sensitive channel attention (MulCA) module which adopts multi-scale convolution and channel attention mechanism to help the network focus on more discriminative frequency bands for noise reduction. Then, to make full use of the phase information in noisy speech, our model takes all the magnitude, real and imaginary spectrograms as inputs. Moreover, by replacing the long short-term memory (LSTM) layers in original full-band model with stacked temporal convolutional network (TCN) blocks, we design a more efficient full-band module called full-band extractor. The experimental results in DNS Challenge dataset show the superior performance of our FullSubNet+, which reaches the state-of-the-art (SOTA) performance and outperforms other existing speech enhancement approaches.
引用
收藏
页码:7857 / 7861
页数:5
相关论文
共 27 条
  • [1] Bai S., 2018, ARXIV180301271
  • [2] Long short-term memory for speaker generalization in supervised speech separation
    Chen, Jitong
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06) : 4705 - 4714
  • [3] REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET
    Choi, Hyeong-Seok
    Park, Sungjin
    Lee, Jie Hwan
    Heo, Hoon
    Jeon, Dongsuk
    Lee, Kyogu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5789 - 5793
  • [4] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
  • [5] End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features
    Fan, Cunhang
    Tao, Jianhua
    Liu, Bin
    Yi, Jiangyan
    Wen, Zhengqi
    Liu, Xuefei
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1303 - 1314
  • [6] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT
    Hao, Xiang
    Su, Xiangdong
    Horaud, Radu
    Li, Xiaofei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6633 - 6637
  • [7] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
  • [8] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    [J]. INTERSPEECH 2020, 2020, : 2472 - 2476
  • [9] PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss
    Isik, Umut
    Giri, Ritwik
    Phansalkar, Neerad
    Valin, Jean-Marc
    Helwani, Karim
    Krishnaswamy, Arvindh
    [J]. INTERSPEECH 2020, 2020, : 2487 - 2491
  • [10] Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152