Reconstruction of missing features for robust speech recognition

被引:154
作者
Raj, B
Seltzer, ML
Stern, RM
机构
[1] Mitsubishi Electr Corp, Res Labs, Cambridge, MA 02139 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
10.1016/j.specom.2004.03.007
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech recognition systems perform poorly in the presence of corrupting noise. Missing feature methods attempt to compensate for the noise by removing noise corrupted components of spectrographic representations of noisy speech and performing recognition with the remaining reliable components. Conventional classifier-compensation methods modify the recognition system to work with the incomplete representations so obtained. This constrains them to perform recognition using spectrographic features which are known to be less optimal than cepstra. In this paper we present two missing-feature algorithms that reconstruct complete spectrograms from incomplete noisy ones. Cepstral vectors can now be derived from the reconstructed spectrograms for recognition. The first algorithm uses MAP procedures to estimate corrupt components from their correlations with reliable components. The second algorithm clusters spectral vectors of clean speech. Corrupt components of noisy speech are estimated from the distribution of the cluster that the analysis frame is identified with. Experiments show that, although conventional classifier-compensation methods are superior when recognition is performed with spectrographic features, cepstra derived from the reconstructed spectrograms result in better recognition performance overall. The proposed methods are also less expensive computationally and do not require modification of the recognizer. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:275 / 296
页数:22
相关论文
共 31 条
[1]  
ACERO A, 1993, ACOUSTIC ENV ROBUSTN
[2]  
[Anonymous], 1996, THESIS CARNEGIE MELL
[3]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[4]   Robust automatic speech recognition with missing and unreliable acoustic data [J].
Cooke, M ;
Green, P ;
Josifovski, L ;
Vizinho, A .
SPEECH COMMUNICATION, 2001, 34 (03) :267-285
[5]  
COOKE M, 1994, TR940501 U SHEFF DEP
[6]  
COOKE MP, 1994, P 3 INT C SPOK LANG, P1555
[7]  
COOKE MP, 1997, P IEEE C AC SPEECH S
[8]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Drygajlo A, 1998, INT CONF ACOUST SPEE, P121, DOI 10.1109/ICASSP.1998.674382