Improved CEM for Speech Harmonic Enhancement in Single Channel Noise Suppression

被引:3
作者
Song, Yanjue [1 ]
Madhu, Nilesh [1 ]
机构
[1] Univ Ghent, Imec, IDLab, B-9000 Ghent, Belgium
关键词
Speech enhancement; Harmonic analysis; Speech processing; Signal to noise ratio; Cepstrum; Power harmonic filters; Noise reduction; Cepstral excitation manipulation; cepstral smoothing; harmonic synthesis; speech enhancement; PRIORI SNR ESTIMATION;
D O I
10.1109/TASLP.2022.3190725
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The periodic nature of voiced speech is often exploited to restore speech harmonics and to increase inter-harmonic noise suppression. In particular, a recent paper proposed to do this by manipulating the speech harmonic frequencies in the cepstral domain. The manipulations were carried out on the cepstrum of the excitation signal, obtained by the source-filter decomposition of speech. This method was termed Cepstral Excitation Manipulation (CEM). In this contribution we further analyse this method, point out its inherent weakness and propose means to overcome it. First of all, it will be shown by both illustrative examples and theoretical analysis that the existing method underestimates the excitation, especially at low signal to noise ratio (SNR) conditions. This inherent weakness leads to speech harmonic weakening and vocoding due to the insufficient noise suppression in the inter-harmonic regions. Then, we propose two modifications to improve the robustness and performance of CEM in low SNR cases. The first modification is to use an instantaneous amplifying factor adapted to the signal, instead of a pre-defined constant, for the excitation cepstrum. The second modification is to smooth the excitation cepstrum to preserve additional fine structure, instead of discarding it. These modifications result in better preservation of speech harmonics, more refined fine structure and higher inter-harmonic noise suppression. Experimental evaluations using a range of standard instrumental metrics conclusively demonstrate that our proposed modifications clearly outperform the existing method, especially in extremely noisy conditions.
引用
收藏
页码:2492 / 2503
页数:12
相关论文
共 21 条
  • [1] [Anonymous], 2006, BACKGR NOIS SIM TE 1
  • [2] [Anonymous], 2017, CORRIGENDUM 1 WIDEBA
  • [3] A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing
    Breithaupt, Colin
    Gerkmann, Timo
    Martin, Rainer
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4897 - 4900
  • [4] Cepstral smoothing of spectral filter gains for speech enhancement without musical noise
    Breithaupt, Colin
    Gerkmann, Timo
    Martin, Rainer
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (12) : 1036 - 1039
  • [5] Model-Based Speech Enhancement With Improved Spectral Envelope Estimation via Dynamics Tracking
    Chen, Ruofei
    Chan, Cheung-Fat
    So, Hing Cheung
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1324 - 1336
  • [6] DNN-Based Cepstral Excitation Manipulation for Speech Enhancement
    Elshamy, Samy
    Fingscheidt, Tim
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1803 - 1814
  • [7] Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation
    Elshamy, Samy
    Madhu, Nilesh
    Tirry, Wouter
    Fingscheidt, Tim
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (08) : 1592 - 1605
  • [8] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
  • [9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121
  • [10] Environment-optimized speech enhancement
    Fingscheidt, Tim
    Suhadi, Suhadi
    Stan, Sorel
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (04): : 825 - 834