Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

被引:0
作者
Soheil Khorram
Hossein Sameti
Fahimeh Bahmaninezhad
Simon King
Thomas Drugman
机构
[1] Sharif University of Technology,Department of Computer Engineering
[2] University of Edinburgh,Centre for Speech Technology Research
[3] TCTS Lab,Faculte Polytechnique de Mons
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2014卷
关键词
Hidden Markov model (HMM)-based speech synthesis; Context-dependent acoustic modeling; Decision tree-based context clustering; Maximum entropy; Overlapped context clusters; Statistical parametric speech synthesis;
D O I
暂无
中图分类号
学科分类号
摘要
Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: (i) The decision tree structure lacks adequate context generalization. (ii) It is unable to express complex context dependencies. (iii) Parameters generated from this structure represent sudden transitions between adjacent states. In order to alleviate the above limitations, many former papers applied multiple decision trees with an additive assumption over those trees. Similarly, the current study uses multiple decision trees as well, but instead of the additive assumption, it is proposed to train the smoothest distribution by maximizing entropy measure. Obviously, increasing the smoothness of the distribution improves the context generalization. The proposed model, named hidden maximum entropy model (HMEM), estimates a distribution that maximizes entropy subject to multiple moment-based constraints. Due to the simultaneous use of multiple decision trees and maximum entropy measure, the three aforementioned issues are considerably alleviated. Relying on HMEM, a novel speech synthesis system has been developed with maximum likelihood (ML) parameter re-estimation as well as maximum output probability parameter generation. Additionally, an effective and fast algorithm that builds multiple decision trees in parallel is devised. Two sets of experiments have been conducted to evaluate the performance of the proposed system. In the first set of experiments, HMEM with some heuristic context clusters is implemented. This system outperformed the decision tree structure in small training databases (i.e., 50, 100, and 200 sentences). In the second set of experiments, the HMEM performance with four parallel decision trees is investigated using both subjective and objective tests. All evaluation results of the second experiment confirm significant improvement of the proposed system over the conventional HSMM.
引用
收藏
相关论文
共 50 条
[1]  
Zen H(2009)Statistical parametric speech synthesis Speech Comm 51 1039-1064
[2]  
Tokuda K(2007)Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training IEICE - Trans. Info. Syst 90 533-543
[3]  
Black AW(1999)Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds Speech Comm 27 187-207
[4]  
Yamagishi J(2012)The deterministic plus stochastic model of the residual signal and its applications IEEE Trans. Audio. Speech. Lang. Process 20 968-981
[5]  
Kobayashi T(2007)A hidden semi-Markov model-based speech synthesis system IEICE - Trans. Info. Syst 90 825-464
[6]  
Kawahara H(2002)Multi-space probability distribution HMM IEICE Trans. on Info. Syst 85 455-428
[7]  
Masuda-Katsuse I(2000)Cluster adaptive training of hidden Markov models IEEE Trans. Speech. Audio. Process 8 417-805
[8]  
de Cheveigné A(2012)Product of experts for statistical parametric speech synthesis, IEEE Trans Audio. Speech. Lang. Process 20 794-923
[9]  
Drugman T(2011)Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis Speech Comm 53 914-824
[10]  
Dutoit T(2007)Speech parameter generation algorithm considering global variance for HMM-based speech synthesis IEICE - Trans. Info. Syst. Arch E90-D 816-300