Feature normalization based on non-extensive statistics for speech recognition

被引:17
作者
Pardede, Hilman F. [1 ]
Iwano, Koji [2 ]
Shinoda, Koichi [1 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Grad Sch Informat Sci & Engn, Meguro Ku, Tokyo 1528552, Japan
[2] Tokyo City Univ, Fac Environm & Informat Studies, Tsuzuki Ku, Yokohama, Kanagawa 2248551, Japan
关键词
Robust speech recognition; Normalization; q-Logarithm; Non-extensive statistics; CROSS-TERMS; NOISE; MODEL; ENHANCEMENT; ENVIRONMENT; SPECTRA; ALGEBRA;
D O I
10.1016/j.specom.2013.02.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CEN-SREC-2 database. It significantly outperformed ETSI AFE front-end. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:587 / 599
页数:13
相关论文
共 47 条
[1]  
Agarwal A, 1999, P IEEE WORKSH AUT SP, P12
[2]  
[Anonymous], 2000, INTERSPEECH, DOI DOI 10.1016/S0167-6393(03)00016-5
[3]   On the effects of short-term spectrum smoothing in channel normalization [J].
Avendano, C ;
Hermansky, H .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (04) :372-374
[4]  
Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208
[5]   Time-dependent entropy estimation of EEG rhythm changes following brain ischemia [J].
Bezerianos, A ;
Tong, S ;
Thakor, N .
ANNALS OF BIOMEDICAL ENGINEERING, 2003, 31 (02) :221-232
[6]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[7]   A possible deformed algebra and calculus inspired in nonextensive thermostatistics [J].
Borges, EP .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2004, 340 (1-3) :95-101
[8]   Relaxed statistical model for speech enhancement and a priori SNR estimation [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05) :870-881
[9]   Enhancement of log Mel power spectra of speech using a phase-sensitive model of the-acoustic environment and sequential estimation of the corrupting noise [J].
Deng, L ;
Droppo, J ;
Acero, A .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (02) :133-143
[10]  
Doblinger G., 1995, P EUR, P1513