Noise robust F0 determination and epoch-marking algorithms

被引:8
|
作者
Kotnik, Bojan [1 ]
Hoege, Harald [2 ]
Kacic, Zdravko [3 ]
机构
[1] ULTRA Doo, Res Ctr Maribor, SI-2000 Maribor, Slovenia
[2] Siemens AG, Corp Technol, Profess Speech Proc, D-81739 Munich, Germany
[3] Univ Maribor, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia
关键词
Fundamental frequency; Glottal closure instant; Epoch marking; Voicing detection; Artificial neural network; FUNDAMENTAL-FREQUENCY ESTIMATION; PITCH DETERMINATION; EXTRACTION; SPEECH;
D O I
10.1016/j.sigpro.2009.04.017
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a combined pitch frequency (F0) determination and epoch (pitch period) marking procedure CPDMA using merged normalized forward-backward correlation. The algorithm consists of several processing steps: preprocessing of the input speech signal, voicing detection using artificial neural networks, F0 determination stage based on normalized correlation. F0 contour postprocessing applying partial Viterbi traceback, and finally, epoch (or pitch period) marking. To evaluate the proposed CPDMA procedure against any other algorithm, a manually segmented PDA/PMA reference database based on real-life SPEECON Spanish speech database has been created. A set of criteria was proposed to objectively and compactly evaluate the performance of any evaluated PDA/PMA or voicing detection algorithm. The performance of the proposed CPDMA was compared with the performance of well-known and publicly available PRAAT toolkit. The PDA and PMA performances achieved with the proposed CPDMA algorithm significantly outperformed the performance of the PRAAT toolkit in all its three considered configurations: autocorrelation method (PRAAT_AC), cross-correlation method (PRAAT_CC), SHS (PRAAT_SHS), and point process (PRAAT_PP). The superior noise robustness of CPDMA is achieved at the expense of a more complex algorithm and consequently leads to worse real time factor when compared to PRAAT. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2555 / 2569
页数:15
相关论文
共 50 条
  • [31] The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise
    Lu, Youyi
    Cooke, Martin
    SPEECH COMMUNICATION, 2009, 51 (12) : 1253 - 1262
  • [32] Detection and F0 discrimination of harmonic complex tones in the presence of competing tones or noise
    Micheyl, Christophe
    Bernstein, Joshua G. W.
    Oxenham, Andrew J.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (03): : 1493 - 1505
  • [33] NOISE-ROBUST F0 ESTIMATION USING SNR-WEIGHTED SUMMARY CORRELOGRAMS FROM MULTI-BAND COMB FILTERS
    Tan, Lee Ngee
    Alwan, Abeer
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4464 - 4467
  • [34] Determination of f0 - a00 mixing angle from QCD sum rules
    T. M. Aliev
    S. Bilmis
    The European Physical Journal A, 2018, 54
  • [35] Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency
    Arifianto, D
    Tanaka, T
    Masuko, T
    Kobayashi, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (12) : 2812 - 2820
  • [36] Robust F0 estimation based on complex LPC analysis for IRS filtered noisy speech
    Funaki, Keiichi
    Kinjo, Tatsuhiko
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2007, E90A (08) : 1579 - 1586
  • [37] Determination of f0 - a00 mixing angle from QCD sum rules
    Aliev, T. M.
    Bilmis, S.
    EUROPEAN PHYSICAL JOURNAL A, 2018, 54 (09):
  • [38] Automatic Determination of the Standard Chinese Prosodic Phrase Boundaries by F0 Generation Model
    Bu, Shehui
    Zhuo, Zhenjie
    Yang, Lingling
    Itahashi, Shuichi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1400 - +
  • [39] Multi-Microphone Periodicity Function for Robust F0 Estimation in Real Noisy and Reverberant Environments
    Flego, Federico
    Omologo, Maurizio
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2146 - 2149
  • [40] Robust method for estimating F0 of complex tone based on pitch perception of amplitude modulated signal
    Miwa, Kenichiro
    Unoki, Masashi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2311 - 2315