A hidden semi-Markov model-based speech synthesis system

被引:154
作者
Zen, Heiga [1 ]
Tokuda, Keiichi
Masuko, Takashi
Kobayasih, Takao
Kitamura, Tadashi
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Engn Sci, Yokohama, Kanagawa 2268502, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 05期
关键词
hidden Markov model; hidden semi-Markov model; HMM-based speech synthesis; MAXIMUM-LIKELIHOOD; DURATION; HMM;
D O I
10.1093/ietisy/e90-d.5.825
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PI)Fs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.
引用
收藏
页码:825 / 834
页数:10
相关论文
共 36 条
[1]  
[Anonymous], 2004, 8 INT C SPOKEN LANGU
[2]  
[Anonymous], 1999, Proc. Eurospeech
[3]  
[Anonymous], 2005, P INTERSPEECH 2005 L
[4]  
[Anonymous], P IEEE INT C AC SPEE
[5]  
[Anonymous], THESIS CAMBRIDGE U
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
Ferguson J.D., 1980, S APPL HIDDEN MARKOV, P143
[8]  
HWANG MY, 1993, P ICASSP, P311
[9]  
Ishimatsu Y, 2001, SP200181 IEICE
[10]  
Iwahashi N, 2000, IEICE T INF SYST, VE83D, P1550