A POLYNOMIAL SEGMENT MODEL BASED STATISTICAL PARAMETRIC SPEECH SYNTHESIS SYSTEM

被引：0

作者：

Sun, Jingwei ^{[1
]}

Ding, Feng ^{[1
]}

Wu, Yahui ^{[1
]}

机构：

[1] Nokia Res, Beijing, Peoples R China

来源：

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年

关键词：

Hidden Markov Model; Polynomial Segment Model; statistical parametric speech synthesis; mean trajectory;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we present a statistical parametric speech synthesis system based on the polynomial segment model (PSM). As one of the segmental models for speech signals, PSM explicitly describes the trajectory of the features in a speech segment, and keeps the internal dynamics of the segment. In this work, spectral and excitation parameters are modeled by PSMs simultaneously, while the duration for each segment is modeled by a single Gaussian distribution. A top-down K-means clustering technique is applied for model tying. Mean trajectories acquired from PSMs are used directly to generate speech parameters according to the estimated segment duration. An English speech synthesizer back-end is implemented on CMU Arctic corpus and the performance of the new approach is compared with the classical HMM-based one. Experimental results show that PSM modeling can achieve similar naturalness and intelligence of the synthetic speech as HMM modeling. The system is in the early stage of its development.

引用

页码：4021 / 4024

页数：4

共 50 条

[21] Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis [J].

Soheil Khorram ;

Hossein Sameti ;

Fahimeh Bahmaninezhad ;

Simon King ;

Thomas Drugman .

EURASIP Journal on Audio, Speech, and Music Processing, 2014

[22] Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states [J].

Patil, Suraj Pandurang ;

Lahudkar, Swapnil Laxman .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (01) :93-98

[23] DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS [J].

Tokuda, Keiichi ;

Zen, Heiga .

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4215-4219

[24] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis [J].

Yu, Kai ;

Young, Steve .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05) :1071-1079

[25] Time-domain deterministic plus noise model based hybrid source modeling for statistical parametric speech synthesis [J].

Narendra, N. P. ;

Rao, K. Sreenivasa .

SPEECH COMMUNICATION, 2016, 77 :65-83

[26] Statistical Parametric Speech Synthesis Using Deep Gaussian Processes [J].

Koriyama, Tomoki ;

Kobayashi, Takao .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (05) :948-959

[27] Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks [J].

Saito, Yuki ;

Takamichi, Shinnosuke ;

Saruwatari, Hiroshi .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) :84-96

[28] COMPLEX CEPSTRUM AS PHASE INFORMATION IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS [J].

Maia, Ranniery ;

Akamine, Masami ;

Gales, M. J. F. .

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :4581-4584

[29] Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis [J].

Takamichi, Shinnosuke ;

Toda, Tomoki ;

Black, Alan W. ;

Neubig, Graham ;

Sakti, Sakriani ;

Nakamura, Satoshi .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) :755-767

[30] IMPROVED TIME-FREQUENCY TRAJECTORY EXCITATION MODELING FOR A STATISTICAL PARAMETRIC SPEECH SYNTHESIS SYSTEM [J].

Song, Eunwoo ;

Joo, Young-Sun ;

Kang, Hong-Goo .

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4949-4953

← 1 2 3 4 5 →