IMPROVEMENTS ON MEL-FREQUENCY CEPSTRUM MINIMUM-MEAN-SQUARE-ERROR NOISE SUPPRESSOR FOR ROBUST SPEECH RECOGNITION

被引:0
|
作者
Yu, Dong [1 ]
Deng, Li [1 ]
Wu, Jian [1 ]
Gong, Yifan [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
来源
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS | 2008年
关键词
MMSE Estimator; MFCC; Noise Reduction; Robust ASR; Speech Feature Enhancement; RPROP; SADLA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently we have developed a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Our novel algorithm operates on the power spectral magnitude of the filter-bank's outputs and outperforms the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah in both recognition accuracy and efficiency as demonstrated on the Aurora-3 corpora. This paper serves two purposes. First, we show that the algorithm is effective on large vocabulary tasks with tri-phone acoustic models. Second, we report improvements on the suppression rule of the original MFCC-MMSE noise suppressor by smoothing the gain over the previous frames to prevent the abrupt change of the gain over frames and adjusting gain function based on the noise power so that the suppression is aggressive when the noise level is high and conservative when the noise level is low. We also propose an efficient and effective parameter tuning algorithm named step-adaptive discriminative learning algorithm (SADLA) to adjust the parameters used by the noise tracker and the suppressor. We observed a 46% relative word error (WEB) reduction on an in-house large-vocabulary noisy speech database with a clean trained model, which translates into a 16% relative WER reduction over the original MFCC-MMSE noise suppressor, and 6% relative WER reduction on the Aurora-3 corpora over our original MFCC-MMSE algorithm or 30% relative WER reduction over the CMN baseline.
引用
收藏
页码:69 / 72
页数:4
相关论文
共 24 条
  • [1] A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition
    Yu, Dong
    Deng, Li
    Droppo, Jasha
    Wu, Ran
    Gong, Yifan
    Acero, Alex
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4041 - 4044
  • [2] IMPROVED CEPSTRA MINIMUM-MEAN-SQUARE-ERROR NOISE REDUCTION ALGORITHM FOR ROBUST SPEECH RECOGNITION
    Li, Jinyu
    Huang, Yan
    Gong, Yifan
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4865 - 4869
  • [3] Speaker Recognition Using Mel-Frequency Cepstrum Coefficients and Sum Square Error
    Charisma, Atik
    Hidayat, M. Reza
    Zainal, Yuda Bakti
    2017 3RD INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2017, : 160 - 163
  • [4] Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor
    Yu, Dong
    Deng, Li
    Droppo, Jasha
    Wu, Jian
    Gong, Yian
    Acero, Alex
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05): : 1061 - 1070
  • [5] A THEORETICALLY CONSISTENT METHOD FOR MINIMUM MEAN-SQUARE ERROR ESTIMATION OF MEL-FREQUENCY CEPSTRAL FEATURES
    Jensen, Jesper
    Tan, Zheng-Hua
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2014, : 368 - 373
  • [6] Minimum-mean-square-error filters for detecting a noisy target in background noise
    Javidi, B
    Parchekani, F
    Zhang, GS
    APPLIED OPTICS, 1996, 35 (35): : 6964 - 6975
  • [7] Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features-A Theoretically Consistent Approach
    Jensen, Jesper
    Tan, Zheng-Hua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 186 - 197
  • [8] Minimum-mean-square-error filters for detecting a noisy target in background noise
    Javidi, Bahram
    Parchekani, Farokh
    Zhang, Guanshen
    1996, Optical Society of America (35):
  • [9] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Pawar, Manju D.
    Kokate, Rajendra D.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
  • [10] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Manju D. Pawar
    Rajendra D. Kokate
    Multimedia Tools and Applications, 2021, 80 : 15563 - 15587