Using asymmetric windows in automatic speech recognition

被引:12
作者
Rozman, Robert [1 ]
Kodek, Dusan M. [1 ]
机构
[1] Univ Ljubljana, Fac Comp & Informat Sci, Lab Architecture & Signal Proc, Ljubljana 1001, Slovenia
关键词
asymmetric windows; windowing; robustness; automatic speech recognition; Short Time Fourier Transform;
D O I
10.1016/j.specom.2007.01.012
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper considers the windowing problem of the short-time frequency analysis that is used in speech recognition systems (SRS). Since human hearing is relatively insensitive to short-time phase distortion of the speech signal there is no apparent reason for the use of symmetric windows which give a linear phase response. Furthermore, phase information is usually completely disregarded in SRS. This should be contrasted with the well-known fact that relaxation of the linearity constraint on window phase results in a better magnitude response and shorter time delay. These observations form a strong argument in favor of the research presented in this paper. First, a general overview of the role that windows play in the frequency analysis stage of SRS is presented. Important properties for speech recognition are highlighted and potential advantages of asymmetric windows are presented. Among them the shorter time delay and the better magnitude response are most important. Two possible design methods for asymmetric windows are discussed. Since little is known about window influence on SRS performance the design methods are first considered from a frequency analysis point of view. This is followed by practical evaluations on real SRS. Expectations were confirmed by the results. The proposed asymmetric windows increased the robustness of elementary, isolated and connected speech recognition on a variety of adverse test conditions. This is particularly true for the case of a combination of additive and low pass convolutional distortions. Further research on asymmetric windows and on the parameterization process as a whole is suggested. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:268 / 276
页数:9
相关论文
共 10 条
[1]   OPTIMAL-DESIGN OF FIR FILTERS WITH THE COMPLEX CHEBYSHEV ERROR CRITERIA [J].
BURNSIDE, D ;
PARKS, TW .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1995, 43 (03) :605-616
[2]  
Fletcher Harvey., 1953, SPEECH HEARING COMMU
[3]  
HERMANSKY H, 1997, P ESCA TUT RES WORKS, P1
[4]   RASTA Processing of Speech [J].
Hermansky, Hynek ;
Morgan, Nelson .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :578-589
[5]   Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end [J].
Milner, Ben ;
Shao, Xu .
SPEECH COMMUNICATION, 2006, 48 (06) :697-715
[6]   PROGRAM FOR DESIGN OF LINEAR PHASE FINITE IMPULSE RESPONSE DIGITAL FILTERS [J].
PARKS, TW ;
MCCLELLA.JH .
IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1972, AU20 (03) :195-&
[7]  
Rabinovitch A, 1989, Reg Immunol, V2, P77
[8]  
ROZMAN R, 2000, P C LANG TECHN LJUBL, P75
[9]  
ROZMAN R, 2003, P EUR INT C COMP TOO, V2, P171
[10]  
Varga M. T. A. P., 1992, The NOISEX-92 study on the effect of additive noise on automatic speech recognition