SNR-Progressive Model With Harmonic Compensation for Low-SNR Speech Enhancement

被引:0
作者
Hou, Zhongshu [1 ,2 ]
Lei, Tong [1 ,2 ]
Hu, Qinwen [1 ,2 ]
Cao, Zhanzhong [3 ]
Lu, Jing [1 ,2 ]
机构
[1] Nanjing Univ, Key Lab Modern Acoust, Nanjing 210008, Peoples R China
[2] Horizon Robot, NJU Horizon Intelligent Audio Lab, Beijing 100094, Peoples R China
[3] Nanjing Inst Informat Technol, Nanjing 210036, Peoples R China
基金
中国国家自然科学基金;
关键词
Harmonic analysis; Signal to noise ratio; Speech enhancement; Power harmonic filters; Estimation; Spectrogram; Noise measurement; Filtering; Training; Artificial neural networks; Low-SNR speech enhancement; neural network; pitch estimation; SNR-progressive learning;
D O I
10.1109/LSP.2024.3484288
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significantly higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive experiments demonstrate the advantage of our proposed model.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 36 条
  • [1] CMGAN: Conformer-based Metric GAN for Speech Enhancement
    Cao, Ruizhe
    Abdulatif, Sherif
    Yang, Bin
    [J]. INTERSPEECH 2022, 2022, : 936 - 940
  • [2] Real Time Speech Enhancement in the Waveform Domain
    Defossez, Alexandre
    Synnaeve, Gabriel
    Adi, Yossi
    [J]. INTERSPEECH 2020, 2020, : 3291 - 3295
  • [3] Dohi K, 2022, Arxiv, DOI arXiv:2206.05876
  • [4] Hao X, 2020, INT CONF ACOUST SPEE, P6959, DOI [10.1109/icassp40776.2020.9053188, 10.1109/ICASSP40776.2020.9053188]
  • [5] UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
    Hao, Xiang
    Su, Xiangdong
    Wang, Zhiyu
    Zhang, Hui
    Batushiren
    [J]. INTERSPEECH 2019, 2019, : 1786 - 1790
  • [6] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    [J]. INTERSPEECH 2020, 2020, : 2472 - 2476
  • [7] Evaluation of objective quality measures for speech enhancement
    Hu, Yi
    Loizou, Philipos C.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
  • [8] Hummersone C, 2014, SIGNALS COMMUN TECHN, P349, DOI 10.1007/978-3-642-55016-4_12
  • [9] SE-Conformer: Time-Domain Speech Enhancement using Conformer
    Kim, Eesung
    Seo, Hyeji
    [J]. INTERSPEECH 2021, 2021, : 2736 - 2740
  • [10] Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
    Le, Xiaohuai
    Lei, Tong
    Chen, Li
    Guo, Yiqing
    He, Chao
    Chen, Cheng
    Xia, Xianjun
    Gao, Hua
    Xiao, Yijian
    Ding, Piao
    Song, Shenyi
    Lu, Jing
    [J]. INTERSPEECH 2023, 2023, : 3894 - 3898