Noise robust F0 determination and epoch-marking algorithms

被引:8
|
作者
Kotnik, Bojan [1 ]
Hoege, Harald [2 ]
Kacic, Zdravko [3 ]
机构
[1] ULTRA Doo, Res Ctr Maribor, SI-2000 Maribor, Slovenia
[2] Siemens AG, Corp Technol, Profess Speech Proc, D-81739 Munich, Germany
[3] Univ Maribor, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia
关键词
Fundamental frequency; Glottal closure instant; Epoch marking; Voicing detection; Artificial neural network; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH DETERMINATION; EXTRACTION; SPEECH;
D O I
10.1016/j.sigpro.2009.04.017
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a combined pitch frequency (F0) determination and epoch (pitch period) marking procedure CPDMA using merged normalized forward-backward correlation. The algorithm consists of several processing steps: preprocessing of the input speech signal, voicing detection using artificial neural networks, F0 determination stage based on normalized correlation. F0 contour postprocessing applying partial Viterbi traceback, and finally, epoch (or pitch period) marking. To evaluate the proposed CPDMA procedure against any other algorithm, a manually segmented PDA/PMA reference database based on real-life SPEECON Spanish speech database has been created. A set of criteria was proposed to objectively and compactly evaluate the performance of any evaluated PDA/PMA or voicing detection algorithm. The performance of the proposed CPDMA was compared with the performance of well-known and publicly available PRAAT toolkit. The PDA and PMA performances achieved with the proposed CPDMA algorithm significantly outperformed the performance of the PRAAT toolkit in all its three considered configurations: autocorrelation method (PRAAT_AC), cross-correlation method (PRAAT_CC), SHS (PRAAT_SHS), and point process (PRAAT_PP). The superior noise robustness of CPDMA is achieved at the expense of a more complex algorithm and consequently leads to worse real time factor when compared to PRAAT. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2555 / 2569
页数:15
相关论文
共 50 条
  • [1] Robust F0 Modeling for Mandarin Speech Recognition in Noise
    Qiang, Sheng
    Qian, Yao
    Soong, Frank K.
    Xu, Congfu
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1101 - +
  • [2] Noise robust speech recognition using F0 contour information
    Iwano, K
    Seki, T
    Furui, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1102 - 1109
  • [3] DETERMINATION OF SPIN OF F0 RESONANCE
    LEE, YY
    VANDERVELDE, JC
    ROE, BP
    SINCLAIR, D
    PHYSICAL REVIEW LETTERS, 1964, 12 (12) : 342 - &
  • [4] Determination of hadronic partial widths for scalar-isoscalar resonances f0(980), f0(1300), f0(1500), f0(1750) and the broad state f0(1530−250+90)
    V. V. Anisovich
    V. A. Nikonov
    A. V. Sarantsev
    Physics of Atomic Nuclei, 2002, 65 : 1545 - 1552
  • [5] Determination of hadronic partial widths for scalar-isoscalar resonances f0(980), f0(1300), f0(1500), f0(1750) and the broad state f0(1530+90-250)
    Anisovich, VV
    Nikonov, VA
    Sarantsev, AV
    PHYSICS OF ATOMIC NUCLEI, 2002, 65 (08) : 1545 - 1552
  • [6] Precise determination of the f0(500) and f0(980) parameters in dispersive analysis of the ππ data
    Kaminski, Robert
    Garcia-Martin, R.
    Pelaez, J. R.
    Ruiz de Elvira, J.
    NUCLEAR PHYSICS B-PROCEEDINGS SUPPLEMENTS, 2013, 234 : 253 - 256
  • [7] Comparison of F0 extraction algorithms for sustained vowels
    Parsa, V.
    Jamieson, D.G.
    Canadian Acoustics - Acoustique Canadienne, 1997, 25 (03):
  • [8] Precise determination of the parameters of resonances f0(500) and f0(980) by fitting the data and dispersion relations
    Kaminski, Robert
    Garcia-Martin, R.
    Pelaez, J. R.
    Ruiz de Elvira, J.
    MESON 2012 - 12TH INTERNATIONAL WORKSHOP ON PRODUCTION, PROPERTIES AND INTERACTION OF MESONS, 2012, 37
  • [9] Precise Determination of the f0(600) and f0(980) Pole Parameters from a Dispersive Data Analysis
    Garcia-Martin, R.
    Kaminski, R.
    Pelaez, J. R.
    Ruiz de Elvira, J.
    PHYSICAL REVIEW LETTERS, 2011, 107 (07)
  • [10] Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks
    Kato, Akihiro
    Kinnunen, Tomi H.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2336 - 2349