Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

被引:11
|
作者
Gowda, Dhananjaya [1 ,2 ]
Kadiri, Sudarsana Reddy [3 ]
Story, Brad [4 ]
Alku, Paavo [3 ]
机构
[1] Aalto Univ, Espoo 02150, Finland
[2] Samsung Res, Seoul R&D Campus, Seoul 06765, South Korea
[3] Aalto Univ, Dept Signal Proc & Acoust, Espoo 02150, Finland
[4] Univ Arizona, Tucson, AZ 85721 USA
基金
芬兰科学院;
关键词
Time-varying linear prediction; weighted linear prediction; quasi-closed-phase analysis; formant tracking; LINEAR PREDICTION; SELECTION; MODEL;
D O I
10.1109/TASLP.2020.3000037
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimateand-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10-50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may he, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the L-1 optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100-200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrack
引用
收藏
页码:1901 / 1914
页数:14
相关论文
共 50 条
  • [1] Time-varying quasi-closed-phase analysis for accurate formant tracking in speech signals
    Gowda, Dhananjaya
    Kadiri, Sudarsana Reddy
    Story, Brad
    Alku, Paavo
    arXiv, 2023,
  • [2] Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking
    Gowda, Dhananjaya
    Alku, Paavo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1760 - 1764
  • [3] QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING
    Gowda, Dhananjaya
    Airaksinen, Manu
    Alku, Paavo
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4980 - 4984
  • [4] Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation
    Gowda, Dhananjaya
    Airaksinen, Manu
    Alku, Paavo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (03): : 1542 - 1553
  • [5] ACCURATE REPRESENTATION OF TIME-VARYING SIGNALS USING MIXED TRANSFORMS WITH APPLICATIONS TO SPEECH
    MIKHAEL, WB
    SPANIAS, AS
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1989, 36 (02): : 329 - 331
  • [6] Allpass Modeling of Phase Spectrum of Speech Signals for Formant Tracking
    Vijayan, Karthika
    Murty, K. Sri Rama
    Li, Haizhou
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1190 - 1196
  • [7] URV ESPRIT FOR TRACKING TIME-VARYING SIGNALS
    LIU, KJR
    OLEARY, DP
    STEWART, GW
    WU, YJJ
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1994, 42 (12) : 3441 - 3448
  • [8] Tracking of time-varying frequency of sinusoidal signals
    Bencheqroune, A
    Benseddik, M
    Hajjari, A
    SIGNAL PROCESSING, 1999, 78 (02) : 191 - 199
  • [9] On polynomial phase signals with time-varying amplitudes
    Zhou, GT
    Giannakis, GB
    Swami, A
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996, 44 (04) : 848 - 861
  • [10] On the Properties of a Time-Varying Quasi-Harmonic Model of Speech
    Pantazis, Yannis
    Rosec, Olivier
    Stylianou, Yannis
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1044 - +