Initialization, training, and context-dependency in HMM-based formant tracking

被引:17
|
作者
Toledano, DT [1 ]
Villardebó, JG
Gómez, LH
机构
[1] Univ Autonoma Madrid, Escuela Politecn Super, ATVS, E-28049 Madrid, Spain
[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain
[3] Univ Politecn Madrid, Escula Tecn Super Ingn Telecomun, Madrid, Spain
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 02期
关键词
automatic segmentation; formant tracking; speech analysis;
D O I
10.1109/TSA.2005.857805
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents an algorithm for formant tracking using HMMs and analyzes the influence of HMM initialization, training and context-dependency on the accuracy of the formant tracks obtained with the HMMs. Formant trackers usually include two different phases: one in which the speech is analyzed and formant candidates are obtained, and another in which, by imposing different constraints, the most likely formants are chosen. While the first stage usually relies on standard spectrum estimation techniques, the second stage has evolved notably in the recent years. Traditionally the second phase tries to impose continuity constraints on the formant selection process. Lately there has been ongoing research to include phonemic knowledge in the second stage to make formant tracking more reliable. In order to incorporate phonemic knowledge newer approaches make use of the orthographic transcription of the speech utterance. From the orthographic transcription, the phonemic transcription is obtained, and from this and the speech itself a phonemic segmentation can be obtained. This phonemic segmentation, along with the phonemic transcription and some knowledge of the nominal formant positions for the different phonemes provides extra information that can be used to obtain more accurate formant tracks. This paper presents a complete HMM-based data-driven algorithm for formant tracking suitable to combine different levels of acoustic and phonemic information. A detailed analysis on the performance of this algorithm is discussed for: different initialization strategies using different levels of knowledge, different degrees of training, and context-independent and dependent HMMs. Experimental speaker-dependent results show that the efficient use of phonemic information in HMM training and context-dependent modeling significantly reduces the formant tracking error rate especially for formants F-2 and F-3.
引用
收藏
页码:511 / 523
页数:13
相关论文
共 50 条
  • [1] x Formant-controlled HMM-based Speech Synthesis
    Lei, Ming
    Yamagishi, Junichi
    Richmond, Korin
    Ling, Zhen-Hua
    King, Simon
    Dai, Li-Rong
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2788 - +
  • [2] Initialization of the HMM-based delay model in networked control systems
    Ge, Yuan
    Zhang, Xiaoxin
    Chen, Qigong
    Jiang, Ming
    INFORMATION SCIENCES, 2016, 364 : 1 - 15
  • [3] Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
    Yu, Kai
    Zen, Heiga
    Mairesse, Francois
    Young, Steve
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 414 - +
  • [4] Context-Dependency Relation Extraction Based on Web Mining
    LiU, Jianzhou
    Xiao, Liang
    Shao, Xiongkai
    PROCEEDINGS OF 2016 SIXTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION & MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2016), 2016, : 117 - 120
  • [5] Frequency line tracking using HMM-based schemes
    Paris, S
    Jauffret, C
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2003, 39 (02) : 439 - 449
  • [6] HMM-based Unusual Motion Detection without Tracking
    Utasi, Akos
    Czuni, Laszlo
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1352 - 1355
  • [7] HMM-Based Multipitch Tracking for Noisy and Reverberant Speech
    Jin, Zhaozhang
    Wang, DeLiang
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1091 - 1102
  • [8] OCCLUSION-AWARE HMM-BASED TRACKING BY LEARNING
    Marpuc, Tughan
    Alatan, A. Aydin
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 4922 - 4926
  • [9] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
    Tsiakoulis, Pirros
    Breslin, Catherine
    Gasic, Milica
    Henderson, Matthew
    Kim, Dongho
    Szummer, Martin
    Thomson, Blaise
    Young, Steve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [10] Normalized training for HMM-based visual speech recognition
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Kitamura, Tadashi
    Kobayashi, Takao
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50