A Covariance-Tying Technique for HMM-Based Speech Synthesis

被引：10

作者：

Oura, Keiichiro ^{[1
]}

Zen, Heiga ^{[1
]}

Nankaku, Yoshihiko ^{[1
]}

Lee, Akinobu ^{[1
]}

Tokuda, Keiichi ^{[1
]}

机构：

[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 03期

关键词：

HMM; speech synthesis; decision tree; context-clustering; MDL criterion; embedded device;

D O I：

10.1587/transinf.E93.D.595

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

引用

页码：595 / 601

页数：7

共 50 条

[31] Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierly
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1808 - 1811
[32] Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis
Coto-Jimenez, Marvin
Goddard-Close, John
Martinez-Licona, Fabiola M.
SPEECH AND COMPUTER, 2014, 8773 : 368 - 375
[33] x Formant-controlled HMM-based Speech Synthesis
Lei, Ming
Yamagishi, Junichi
Richmond, Korin
Ling, Zhen-Hua
King, Simon
Dai, Li-Rong
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2788 - +
[34] HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering
Raitio, Tuomo
Suni, Antti
Yamagishi, Junichi
Pulakka, Hannu
Nurminen, Jani
Vainio, Martti
Alku, Paavo
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 153 - 165
[35] Extended Decision Tree with OR Relationship for HMM-based Speech Synthesis
Wang, Yang
Tao, Jianhua
Yang, Minghao
Li, Ya
2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 225 - 229
[36] HMM-based Korean speech synthesis system for hand-held devices
Kim, Sang-Jin
Kim, Jong-Jin
Hahn, Minsoo
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2006, 52 (04) : 1384 - 1390
[37] Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
Ling, Zhen-Hua
Richmond, Korin
Yamagishi, Junichi
Wang, Ren-Hua
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1171 - 1185
[38] Evaluation of Prosodic Contextual Factors for HMM-based Speech Synthesis
Yokomizo, Shuji
Nose, Takashi
Kobayashi, Takao
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 430 - 433
[39] Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
Yamagishi, Junichi
Watts, Oliver
King, Simon
Usabaev, Bela
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 418 - +
[40] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
Shiga, Yoshinori
Toda, Tomoki
Sakai, Shinsuke
Kawai, Hisashi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812

← 1 2 3 4 5 →