SNR-Progressive Model With Harmonic Compensation for Low-SNR Speech Enhancement

被引：0

作者：

Hou, Zhongshu ^{[1
,2
]}

Lei, Tong ^{[1
,2
]}

Hu, Qinwen ^{[1
,2
]}

Cao, Zhanzhong ^{[3
]}

Lu, Jing ^{[1
,2
]}

机构：

[1] Nanjing Univ, Key Lab Modern Acoust, Nanjing 210008, Peoples R China

[2] Horizon Robot, NJU Horizon Intelligent Audio Lab, Beijing 100094, Peoples R China

[3] Nanjing Inst Informat Technol, Nanjing 210036, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2025年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Harmonic analysis; Signal to noise ratio; Speech enhancement; Power harmonic filters; Estimation; Spectrogram; Noise measurement; Filtering; Training; Artificial neural networks; Low-SNR speech enhancement; neural network; pitch estimation; SNR-progressive learning;

D O I：

10.1109/LSP.2024.3484288

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significantly higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive experiments demonstrate the advantage of our proposed model.

引用

页码：476 / 480

页数：5

共 36 条

[1] CMGAN: Conformer-based Metric GAN for Speech Enhancement
Cao, Ruizhe
Abdulatif, Sherif
Yang, Bin
[J]. INTERSPEECH 2022, 2022, : 936 - 940
[2] Real Time Speech Enhancement in the Waveform Domain
Defossez, Alexandre
Synnaeve, Gabriel
Adi, Yossi
[J]. INTERSPEECH 2020, 2020, : 3291 - 3295
[3] Dohi K, 2022, Arxiv, DOI arXiv:2206.05876
[4] Hao X, 2020, INT CONF ACOUST SPEE, P6959, DOI [10.1109/icassp40776.2020.9053188, 10.1109/ICASSP40776.2020.9053188]
[5] UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
Hao, Xiang
Su, Xiangdong
Wang, Zhiyu
Zhang, Hui
Batushiren
[J]. INTERSPEECH 2019, 2019, : 1786 - 1790
[6] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Hu, Yanxin
Liu, Yun
Lv, Shubo
Xing, Mengtao
Zhang, Shimin
Fu, Yihui
Wu, Jian
Zhang, Bihong
Xie, Lei
[J]. INTERSPEECH 2020, 2020, : 2472 - 2476
[7] Evaluation of objective quality measures for speech enhancement
Hu, Yi
Loizou, Philipos C.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
[8] Hummersone C, 2014, SIGNALS COMMUN TECHN, P349, DOI 10.1007/978-3-642-55016-4_12
[9] SE-Conformer: Time-Domain Speech Enhancement using Conformer
Kim, Eesung
Seo, Hyeji
[J]. INTERSPEECH 2021, 2021, : 2736 - 2740
[10] Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Le, Xiaohuai
Lei, Tong
Chen, Li
Guo, Yiqing
He, Chao
Chen, Cheng
Xia, Xianjun
Gao, Hua
Xiao, Yijian
Ding, Piao
Song, Shenyi
Lu, Jing
[J]. INTERSPEECH 2023, 2023, : 3894 - 3898

← 1 2 3 4 →