SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions

被引:37
作者
Chu, Wei [1 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90095 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 03期
基金
美国国家科学基金会;
关键词
F0; estimation; noisy speech; pitch detection; statistical learning; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH; SPEECH; AUTOCORRELATION; ALGORITHM;
D O I
10.1109/TASL.2011.2168518
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A novel Statistical Algorithm for F0 Estimation (SAFE) is proposed to improve the accuracy of F0 estimation under both clean and noisy conditions. Prominent signal-to-noise ratio (SNR) peaks in speech spectra constitute a robust information source from which F0 can be inferred. A probabilistic framework is proposed to model the effect of noise on voiced speech spectra. Prominent SNR peaks in the low-frequency band (0 - 1000 Hz) are important to F0 estimation, and prominent SNR peaks in the middle and high-frequency bands (1000-3000 Hz) are also useful supplemental information to F0 estimation under noisy conditions, especially the babble noise condition. Experiments show that the SAFE algorithm has the lowest gross pitch errors (GPEs) compared to prevailing F0 trackers in white and babble noise conditions at low SNRs. Experimental results also show that SAFE is robust in maintaining a low mean and standard deviation of the fine pitch errors (MFPE and SDFPE) in noise. The code of SAFE is available at http://www.ee.ucla.edu/similar to weichu/safe.
引用
收藏
页码:933 / 944
页数:12
相关论文
共 34 条
  • [1] Abe T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1277, DOI 10.1109/ICSLP.1996.607843
  • [2] [Anonymous], 1983, PITCH DETERMINATION, DOI DOI 10.1007/978-3-642-81926-1
  • [3] Bagshaw P., 1993, Proc. Eurospeech, P1003
  • [4] Boersma P., 2013, Praat: doing phonetics by computer, DOI DOI 10.1097/AUD.0B013E31821473F7
  • [5] Chu W, 2009, INT CONF ACOUST SPEE, P3969, DOI 10.1109/ICASSP.2009.4960497
  • [6] YIN, a fundamental frequency estimator for speech and music
    de Cheveigné, A
    Kawahara, H
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) : 1917 - 1930
  • [7] de Cheveigne A., 1991, PROC ICPHS, P218
  • [8] Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
    Deshmukh, O
    Espy-Wilson, CY
    Salomon, A
    Singh, J
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 776 - 786
  • [9] Fant G., 1960, ACOUSTIC THEORY SPEE
  • [10] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
    Fiscus, JG
    [J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 347 - 354