IMPROVEMENTS ON MEL-FREQUENCY CEPSTRUM MINIMUM-MEAN-SQUARE-ERROR NOISE SUPPRESSOR FOR ROBUST SPEECH RECOGNITION

被引：0

作者：

Yu, Dong ^{[1
]}

Deng, Li ^{[1
]}

Wu, Jian ^{[1
]}

Gong, Yifan ^{[1
]}

Acero, Alex ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS | 2008年

关键词：

MMSE Estimator; MFCC; Noise Reduction; Robust ASR; Speech Feature Enhancement; RPROP; SADLA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently we have developed a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Our novel algorithm operates on the power spectral magnitude of the filter-bank's outputs and outperforms the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah in both recognition accuracy and efficiency as demonstrated on the Aurora-3 corpora. This paper serves two purposes. First, we show that the algorithm is effective on large vocabulary tasks with tri-phone acoustic models. Second, we report improvements on the suppression rule of the original MFCC-MMSE noise suppressor by smoothing the gain over the previous frames to prevent the abrupt change of the gain over frames and adjusting gain function based on the noise power so that the suppression is aggressive when the noise level is high and conservative when the noise level is low. We also propose an efficient and effective parameter tuning algorithm named step-adaptive discriminative learning algorithm (SADLA) to adjust the parameters used by the noise tracker and the suppressor. We observed a 46% relative word error (WEB) reduction on an in-house large-vocabulary noisy speech database with a clean trained model, which translates into a 16% relative WER reduction over the original MFCC-MMSE noise suppressor, and 6% relative WER reduction on the Aurora-3 corpora over our original MFCC-MMSE algorithm or 30% relative WER reduction over the CMN baseline.

引用

页码：69 / 72

页数：4

共 24 条

[1] A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition
Yu, Dong
Deng, Li
Droppo, Jasha
Wu, Ran
Gong, Yifan
Acero, Alex
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4041 - 4044
[2] IMPROVED CEPSTRA MINIMUM-MEAN-SQUARE-ERROR NOISE REDUCTION ALGORITHM FOR ROBUST SPEECH RECOGNITION
Li, Jinyu
Huang, Yan
Gong, Yifan
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4865 - 4869
[3] Speaker Recognition Using Mel-Frequency Cepstrum Coefficients and Sum Square Error
Charisma, Atik
Hidayat, M. Reza
Zainal, Yuda Bakti
2017 3RD INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2017, : 160 - 163
[4] Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor
Yu, Dong
Deng, Li
Droppo, Jasha
Wu, Jian
Gong, Yian
Acero, Alex
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05): : 1061 - 1070
[5] A THEORETICALLY CONSISTENT METHOD FOR MINIMUM MEAN-SQUARE ERROR ESTIMATION OF MEL-FREQUENCY CEPSTRAL FEATURES
Jensen, Jesper
Tan, Zheng-Hua
2014 4TH IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2014, : 368 - 373
[6] Minimum-mean-square-error filters for detecting a noisy target in background noise
Javidi, B
Parchekani, F
Zhang, GS
APPLIED OPTICS, 1996, 35 (35): : 6964 - 6975
[7] Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features-A Theoretically Consistent Approach
Jensen, Jesper
Tan, Zheng-Hua
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 186 - 197
[8] Minimum-mean-square-error filters for detecting a noisy target in background noise
Javidi, Bahram
Parchekani, Farokh
Zhang, Guanshen
1996, Optical Society of America (35):
[9] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Pawar, Manju D.
Kokate, Rajendra D.
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
[10] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Manju D. Pawar
Rajendra D. Kokate
Multimedia Tools and Applications, 2021, 80 : 15563 - 15587

← 1 2 3 →