SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions

被引:37
作者
Chu, Wei [1 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90095 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 03期
基金
美国国家科学基金会;
关键词
F0; estimation; noisy speech; pitch detection; statistical learning; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH; SPEECH; AUTOCORRELATION; ALGORITHM;
D O I
10.1109/TASL.2011.2168518
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A novel Statistical Algorithm for F0 Estimation (SAFE) is proposed to improve the accuracy of F0 estimation under both clean and noisy conditions. Prominent signal-to-noise ratio (SNR) peaks in speech spectra constitute a robust information source from which F0 can be inferred. A probabilistic framework is proposed to model the effect of noise on voiced speech spectra. Prominent SNR peaks in the low-frequency band (0 - 1000 Hz) are important to F0 estimation, and prominent SNR peaks in the middle and high-frequency bands (1000-3000 Hz) are also useful supplemental information to F0 estimation under noisy conditions, especially the babble noise condition. Experiments show that the SAFE algorithm has the lowest gross pitch errors (GPEs) compared to prevailing F0 trackers in white and babble noise conditions at low SNRs. Experimental results also show that SAFE is robust in maintaining a low mean and standard deviation of the fine pitch errors (MFPE and SDFPE) in noise. The code of SAFE is available at http://www.ee.ucla.edu/similar to weichu/safe.
引用
收藏
页码:933 / 944
页数:12
相关论文
共 34 条
[11]   PARALLEL PROCESSING TECHNIQUES FOR ESTIMATING PITCH PERIODS OF SPEECH IN TIME DOMAIN [J].
GOLD, B ;
RABINER, L .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1969, 46 (2P2) :442-&
[12]  
HEDELIN P, 1990, INT CONF ACOUST SPEE, P361, DOI 10.1109/ICASSP.1990.115685
[13]  
Hirsch H.-G., 2000, 6 INT C SPOKEN LANGU, P181
[14]   MINIMUM PREDICTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION [J].
ITAKURA, F .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :67-72
[15]  
Kasi K, 2002, INT CONF ACOUST SPEE, P361
[16]  
Kawahara Hideki., 1999, Proc. of Eurospeech, P2781
[17]   AN AUTOCORRELATION PITCH DETECTOR AND VOICING DECISION WITH CONFIDENCE MEASURES DEVELOPED FOR NOISE-CORRUPTED SPEECH [J].
KRUBSACK, DA ;
NIEDERJOHN, RJ .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (02) :319-329
[18]   A SPECTRAL AUTOCORRELATION METHOD FOR MEASUREMENT OF THE FUNDAMENTAL-FREQUENCY OF NOISE-CORRUPTED SPEECH [J].
LAHAT, M ;
NIEDERJOHN, RJ ;
KRUBSACK, DA .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (06) :741-750
[19]   Single and multiple F0 contour estimation through parametric spectrogram Modeling of speech in noisy environments [J].
Le Roux, Jonathan ;
Kameoka, Hirokazu ;
Ono, Nobutaka ;
de Cheveigne, Alain ;
Sagayama, Shigeki .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04) :1135-1145
[20]   A DUPLEX THEORY OF PITCH PERCEPTION [J].
LICKLIDER, JCR .
EXPERIENTIA, 1951, 7 (04) :128-134