A Covariance-Tying Technique for HMM-Based Speech Synthesis

被引：10

作者：

Oura, Keiichiro ^{[1
]}

Zen, Heiga ^{[1
]}

Nankaku, Yoshihiko ^{[1
]}

Lee, Akinobu ^{[1
]}

Tokuda, Keiichi ^{[1
]}

机构：

[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 03期

关键词：

HMM; speech synthesis; decision tree; context-clustering; MDL criterion; embedded device;

D O I：

10.1587/transinf.E93.D.595

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

引用

页码：595 / 601

页数：7

共 50 条

[21] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[22] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierry
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02) : 687 - 707
[23] The Design and Implementation of HMM-based Dai Speech Synthesis
Wang, Zhan
Yang, Jian
Yang, Xin
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[24] REACTIVE AND CONTINUOUS CONTROL OF HMM-BASED SPEECH SYNTHESIS
Astrinaki, Maria
d'Alessandro, Nicolas
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierry
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 252 - 257
[25] Contextual Additive Structure for HMM-Based Speech Synthesis
Takaki, Shinji
Nankaku, Yoshihiko
Tokuda, Keiichi
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 229 - 238
[26] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
Silen, Hanna
Helander, Elina
Nurminen, Jani
Gabbouj, Moncef
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +
[27] A training method of average voice model for HMM-based speech synthesis
Yamagishi, J
Tamura, M
Masuko, T
Tokuda, K
Kobayashi, T
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2003, E86A (08) : 1956 - 1963
[28] SPEECH-LAUGHS: AN HMM-BASED APPROACH FOR AMUSED SPEECH SYNTHESIS
El Haddad, Kevin
Dupont, Stephane
Urbain, Jerome
Dutoit, Thierry
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4939 - 4943
[29] HMM-based Tibetan Lhasa Speech Synthesis System
Wu Zhiqiang
Yu Hongzhi
Li Guanyu
Wan Shuhui
2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 92 - 95
[30] CONTEXTUAL PARTIAL ADDITIVE STRUCTURE FOR HMM-BASED SPEECH SYNTHESIS
Takaki, Shinji
Nankaku, Yoshihiko
Tokuda, Keiichi
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7878 - 7882

← 1 2 3 4 5 →