Speech formant frequency estimation: evaluating a nonstationary analysis method

被引:13
作者
Rao, P [1 ]
Das Barman, A [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
关键词
speech analysis; formant tracking; instantaneous frequency;
D O I
10.1016/S0165-1684(00)00099-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The objective of this paper is to critically evaluate the performance of a nonstationary analysis method in tracking speech formant frequencies as they change with time due to the natural variations in the vocal-tract system during speech production. The method of instantaneous frequency estimation is applied to the tracking of speech formant frequencies to observe the time variations in the vocal-tract system characteristics within a pitch period. An implementation of an instantaneous frequency estimator based on the source-filter model of speech production is described for voiced speech formants, Based on experimental results from simulated as well as natural speech data, it is shown that the accuracy of the frequency estimates is heavily dependent on the nature of the glottal excitation waveform, the fundamental frequency and the frequency spacing of the formants in the speech signal. The choice of various analysis parameters on the accuracy of the estimates is discussed. It is shown that only when the formants are well separated and there are distinct regions of the glottal cycle in which the source excitation can be considered to be negligible, does the instantaneous frequency estimate accurately represent the actual formant frequency. Experimental results on natural speech vowels which show differences in formant frequencies in the different phases of the glottal cycle are presented. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1655 / 1667
页数:13
相关论文
共 11 条
[1]   MEASURING AND MODELING VOCAL SOURCE-TRACT INTERACTION [J].
CHILDERS, DG ;
WONG, CF .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 1994, 41 (07) :663-671
[2]   Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications [J].
Kumaresan, R ;
Rao, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (03) :1912-1924
[3]   ENERGY SEPARATION IN SIGNAL MODULATIONS WITH APPLICATION TO SPEECH ANALYSIS [J].
MARAGOS, P ;
KAISER, JF ;
QUATIERI, TF .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (10) :3024-3051
[4]  
*NIST, 1990, DARPA TIMIT AC PHON
[5]   ESTIMATION OF INSTANTANEOUS FREQUENCY USING THE DISCRETE WIGNER DISTRIBUTION [J].
RAO, P ;
TAYLOR, FJ .
ELECTRONICS LETTERS, 1990, 26 (04) :246-248
[6]  
RAO P, 1996, P INT C AC SPEECH SI
[7]   ACCURACY OF QUASI-STATIONARY ANALYSIS OF HIGHLY DYNAMIC SPEECH SIGNALS [J].
SMITS, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 96 (06) :3401-3415
[9]  
VELDHUIS R, 1997, J ACOUST SOC AM, V103, P566
[10]  
WOKUREK W, 1987, P INT C DIG SIGN PRO