Further intelligibility results from human listening tests using the short-time phase spectrum

被引:47
作者
Alsteris, Leigh D. [1 ]
Paliwal, Kuldip K. [1 ]
机构
[1] Griffith Univ, Sch Microelect Engn, Brisbane, Qld 4111, Australia
关键词
short-time Fourier transform; phase spectrum; magnitude spectrum; speech perception; overlap-add procedure; automatic speech recognition; feature extraction; group delay function; instantaneous frequency distribution;
D O I
10.1016/j.specom.2005.10.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art automatic speech recognition systems (ASRs) use only the short-time magnitude spectrum for feature extraction; the short-time phase spectrum is generally ignored in these systems. Results from our recent human listening tests indicate that the short-time phase spectrum can significantly contribute to speech intelligibility over small window durations (i.e., 20-40 ms). This is an interesting result, indicating the possible usefulness of the short-time phase spectrum for ASR. which commonly employs small window durations of 20-40 ms for spectral analysis. In this paper, we continue our investigation of the short-time phase spectrum. We explore the use of partial short-time phase spectrum information, in the absence of all the short-time magnitude spectrum information, for intelligible signal reconstruction. We create two types of stimuli; one in which its frequency-derivative (i.e., group delay function, GDF) is preserved and another in which its time-derivative (i.e., instantaneous frequency distribution, IFD) is preserved. We do this to determine the contribution that each of these derivatives provides toward intelligibility. Reconstructing stimuli from knowledge of only the GDF or only the IFD results in poor intelligibility. However, when we create stimuli using knowledge of both the GDF and the IFD. reasonable intelligibility is obtained. In light of these results, we conclude that both the GDF and IFD components of the short-time phase spectrum are needed to reconstruct an intelligible signal. In addition, we also perform some experiments to quantify the intelligibility of stimuli reconstructed from the short-time phase and magnitude spectra of noisy speech. The intelligibility of stimuli constructed from either the short-time magnitude spectrum or the short-time phase spectrum degrades at a similar rate under increasing noise levels. The intelligibility of the original signals under noisy conditions also degrades with increased noise, but in all cases the intelligibility is superior to that provided by the stimuli constructed from the separate short-time components. Therefore, we argue that knowledge of both short-time magnitude and phase spectrum information results in superior human speech recognition performance. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:727 / 736
页数:10
相关论文
共 36 条
[1]  
ABE T, 1995, INT CONF ACOUST SPEE, P756, DOI 10.1109/ICASSP.1995.479804
[2]   UNIFIED APPROACH TO SHORT-TIME FOURIER-ANALYSIS AND SYNTHESIS [J].
ALLEN, JB ;
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1977, 65 (11) :1558-1564
[3]  
Alsteris, 2003, P EUR 2003, P2117
[4]  
Alsteris LD, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P573
[5]  
ALSTERIS LD, 2005, P INT S SIGN PROC IT, P715
[6]  
BOZKURT B, 2004, EUSPICO
[7]  
CHARPENTIER FJ, 1986, P IEEE INT C AC SPEE, P113
[8]   WEIGHTED OVERLAP-ADD METHOD OF SHORT-TIME FOURIER ANALYSIS-SYNTHESIS [J].
CROCHIERE, RE .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (01) :99-102
[9]  
Delgutte B., 1996, AUDITORY COMPUTATION
[10]  
DIMITRIADIS D, 2003, P EUR GEN SWITZ SEP, P2853