Initialization, training, and context-dependency in HMM-based formant tracking

被引：17

作者：

Toledano, DT ^{[1
]}

Villardebó, JG

Gómez, LH

机构：

[1] Univ Autonoma Madrid, Escuela Politecn Super, ATVS, E-28049 Madrid, Spain

[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain

[3] Univ Politecn Madrid, Escula Tecn Super Ingn Telecomun, Madrid, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 02期

关键词：

automatic segmentation; formant tracking; speech analysis;

D O I：

10.1109/TSA.2005.857805

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents an algorithm for formant tracking using HMMs and analyzes the influence of HMM initialization, training and context-dependency on the accuracy of the formant tracks obtained with the HMMs. Formant trackers usually include two different phases: one in which the speech is analyzed and formant candidates are obtained, and another in which, by imposing different constraints, the most likely formants are chosen. While the first stage usually relies on standard spectrum estimation techniques, the second stage has evolved notably in the recent years. Traditionally the second phase tries to impose continuity constraints on the formant selection process. Lately there has been ongoing research to include phonemic knowledge in the second stage to make formant tracking more reliable. In order to incorporate phonemic knowledge newer approaches make use of the orthographic transcription of the speech utterance. From the orthographic transcription, the phonemic transcription is obtained, and from this and the speech itself a phonemic segmentation can be obtained. This phonemic segmentation, along with the phonemic transcription and some knowledge of the nominal formant positions for the different phonemes provides extra information that can be used to obtain more accurate formant tracks. This paper presents a complete HMM-based data-driven algorithm for formant tracking suitable to combine different levels of acoustic and phonemic information. A detailed analysis on the performance of this algorithm is discussed for: different initialization strategies using different levels of knowledge, different degrees of training, and context-independent and dependent HMMs. Experimental speaker-dependent results show that the efficient use of phonemic information in HMM training and context-dependent modeling significantly reduces the formant tracking error rate especially for formants F-2 and F-3.

引用

页码：511 / 523

页数：13

共 50 条

[1] x Formant-controlled HMM-based Speech Synthesis
Lei, Ming
Yamagishi, Junichi
Richmond, Korin
Ling, Zhen-Hua
King, Simon
Dai, Li-Rong
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2788 - +
[2] Initialization of the HMM-based delay model in networked control systems
Ge, Yuan
Zhang, Xiaoxin
Chen, Qigong
Jiang, Ming
INFORMATION SCIENCES, 2016, 364 : 1 - 15
[3] Context Adaptive Training with Factorized Decision Trees for HMM-Based Speech Synthesis
Yu, Kai
Zen, Heiga
Mairesse, Francois
Young, Steve
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 414 - +
[4] Context-Dependency Relation Extraction Based on Web Mining
LiU, Jianzhou
Xiao, Liang
Shao, Xiongkai
PROCEEDINGS OF 2016 SIXTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION & MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2016), 2016, : 117 - 120
[5] Frequency line tracking using HMM-based schemes
Paris, S
Jauffret, C
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2003, 39 (02) : 439 - 449
[6] HMM-based Unusual Motion Detection without Tracking
Utasi, Akos
Czuni, Laszlo
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1352 - 1355
[7] HMM-Based Multipitch Tracking for Noisy and Reverberant Speech
Jin, Zhaozhang
Wang, DeLiang
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1091 - 1102
[8] OCCLUSION-AWARE HMM-BASED TRACKING BY LEARNING
Marpuc, Tughan
Alatan, A. Aydin
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 4922 - 4926
[9] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
Tsiakoulis, Pirros
Breslin, Catherine
Gasic, Milica
Henderson, Matthew
Kim, Dongho
Szummer, Martin
Thomson, Blaise
Young, Steve
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[10] Normalized training for HMM-based visual speech recognition
Nankaku, Yoshihiko
Tokuda, Keiichi
Kitamura, Tadashi
Kobayashi, Takao
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50

← 1 2 3 4 5 →