Robust Mandarin speech recognition in car environments for embedded navigation system

被引:7
作者
Ding, Pei [1 ]
He, Lei [1 ]
Yan, Xiang [1 ]
Zhao, Rui [1 ]
Hao, Jie [1 ]
机构
[1] Toshiba China, Speech Grp, Res & Dev Ctr, Beijing 100738, Peoples R China
关键词
speech recognition; noise robustness; accented Mandarin; car navigation;
D O I
10.1109/TCE.2008.4560134
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A low-cost robust Mandarin speech recognition system is investigated for embedded car navigation application. In the front-end, log-spectral minimum mean-square error (LogMMSE) estimation algorithm is applied to suppress the background noise, and a piece-wise linear function is used to approximate the traditional Taylor expansion in its gain function calculation to reduce the computational complexity. After speech enhancement, spectral smoothing is implemented in both time and frequency indexes with geometric sequence weights to further compensate the spectral components distorted by noise over-reduction. In acoustic model training, an immunity learning scheme is applied, in which pre-recorded car noise is artificially added to clean. training utterances to simulate the in-car environment. In the context of Mandarin speech recognition, a special difficulty is the diversity of Chinese dialects, i.e. the pronunciation difference among accents degrades the recognition performance if the acoustic models are trained with a mismatched accented database. We propose to train the models with multiple accented Mandarin databases to deal with this problem. Evaluation results of isolated phrase recognition confirm the effectivity of the proposed technologies(1).
引用
收藏
页码:584 / 590
页数:7
相关论文
共 18 条
[1]  
[Anonymous], 2002, ETSI ES
[2]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[3]  
DAVIS S, 1980, IEEE T ACOUST SPEECH, V4, P357
[4]  
DROPPO J, 2001, P EUR, P217
[5]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445
[6]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[7]   Robust continuous speech recognition using parallel model combination [J].
Gales, MJF ;
Young, SJ .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05) :352-359
[8]   SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY [J].
GONG, YF .
SPEECH COMMUNICATION, 1995, 16 (03) :261-291
[9]  
HUANG C, 2000, P ICSLP, P818
[10]   State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition [J].
Liu, Y ;
Fung, P .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04) :351-364