Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

被引：0

作者：

Soheil Khorram

Hossein Sameti

Fahimeh Bahmaninezhad

Simon King

Thomas Drugman

机构：

[1] Sharif University of Technology,Department of Computer Engineering

[2] University of Edinburgh,Centre for Speech Technology Research

[3] TCTS Lab,Faculte Polytechnique de Mons

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2014卷

关键词：

Hidden Markov model (HMM)-based speech synthesis; Context-dependent acoustic modeling; Decision tree-based context clustering; Maximum entropy; Overlapped context clusters; Statistical parametric speech synthesis;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: (i) The decision tree structure lacks adequate context generalization. (ii) It is unable to express complex context dependencies. (iii) Parameters generated from this structure represent sudden transitions between adjacent states. In order to alleviate the above limitations, many former papers applied multiple decision trees with an additive assumption over those trees. Similarly, the current study uses multiple decision trees as well, but instead of the additive assumption, it is proposed to train the smoothest distribution by maximizing entropy measure. Obviously, increasing the smoothness of the distribution improves the context generalization. The proposed model, named hidden maximum entropy model (HMEM), estimates a distribution that maximizes entropy subject to multiple moment-based constraints. Due to the simultaneous use of multiple decision trees and maximum entropy measure, the three aforementioned issues are considerably alleviated. Relying on HMEM, a novel speech synthesis system has been developed with maximum likelihood (ML) parameter re-estimation as well as maximum output probability parameter generation. Additionally, an effective and fast algorithm that builds multiple decision trees in parallel is devised. Two sets of experiments have been conducted to evaluate the performance of the proposed system. In the first set of experiments, HMEM with some heuristic context clusters is implemented. This system outperformed the decision tree structure in small training databases (i.e., 50, 100, and 200 sentences). In the second set of experiments, the HMEM performance with four parallel decision trees is investigated using both subjective and objective tests. All evaluation results of the second experiment confirm significant improvement of the proposed system over the conventional HSMM.

引用

共 50 条

[1]

Zen H(2009)Statistical parametric speech synthesis Speech Comm 51 1039-1064

[2]

Tokuda K(2007)Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training IEICE - Trans. Info. Syst 90 533-543

[3]

Black AW(1999)Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds Speech Comm 27 187-207

[4]

Yamagishi J(2012)The deterministic plus stochastic model of the residual signal and its applications IEEE Trans. Audio. Speech. Lang. Process 20 968-981

[5]

Kobayashi T(2007)A hidden semi-Markov model-based speech synthesis system IEICE - Trans. Info. Syst 90 825-464

[6]

Kawahara H(2002)Multi-space probability distribution HMM IEICE Trans. on Info. Syst 85 455-428

[7]

Masuda-Katsuse I(2000)Cluster adaptive training of hidden Markov models IEEE Trans. Speech. Audio. Process 8 417-805

[8]

de Cheveigné A(2012)Product of experts for statistical parametric speech synthesis, IEEE Trans Audio. Speech. Lang. Process 20 794-923

[9]

Drugman T(2011)Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis Speech Comm 53 914-824

[10]

Dutoit T(2007)Speech parameter generation algorithm considering global variance for HMM-based speech synthesis IEICE - Trans. Info. Syst. Arch E90-D 816-300

← 1 2 3 4 5 →