A POLYNOMIAL SEGMENT MODEL BASED STATISTICAL PARAMETRIC SPEECH SYNTHESIS SYSTEM

被引:0
作者
Sun, Jingwei [1 ]
Ding, Feng [1 ]
Wu, Yahui [1 ]
机构
[1] Nokia Res, Beijing, Peoples R China
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
Hidden Markov Model; Polynomial Segment Model; statistical parametric speech synthesis; mean trajectory;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a statistical parametric speech synthesis system based on the polynomial segment model (PSM). As one of the segmental models for speech signals, PSM explicitly describes the trajectory of the features in a speech segment, and keeps the internal dynamics of the segment. In this work, spectral and excitation parameters are modeled by PSMs simultaneously, while the duration for each segment is modeled by a single Gaussian distribution. A top-down K-means clustering technique is applied for model tying. Mean trajectories acquired from PSMs are used directly to generate speech parameters according to the estimated segment duration. An English speech synthesizer back-end is implemented on CMU Arctic corpus and the performance of the new approach is compared with the classical HMM-based one. Experimental results show that PSM modeling can achieve similar naturalness and intelligence of the synthetic speech as HMM modeling. The system is in the early stage of its development.
引用
收藏
页码:4021 / 4024
页数:4
相关论文
共 50 条
[1]   Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis [J].
Erro, Daniel ;
Sainz, Inaki ;
Navas, Eva ;
Hernaez, Inma .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) :184-194
[2]   STATISTICAL PARAMETRIC SPEECH SYNTHESIS BASED ON PRODUCT OF EXPERTS [J].
Zen, Heiga ;
Gales, Mark J. F. ;
Nankaku, Yoshihiko ;
Tokuda, Keiichi .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4242-4245
[3]   Statistical parametric speech synthesis using a hidden trajectory model [J].
Cai, Ming-Qi ;
Ling, Zhen-Hua ;
Dai, Li-Rong .
SPEECH COMMUNICATION, 2015, 72 :149-159
[4]   DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis [J].
Hu, Ya-Jun ;
Ling, Zhen-Hua .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (03) :321-325
[5]   An introduction to statistical parametric speech synthesis [J].
King, Simon .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05) :837-852
[6]   An introduction to statistical parametric speech synthesis [J].
Simon King .
Sadhana, 2011, 36 :837-852
[7]   THE EFFECT OF NEURAL NETWORKS IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS [J].
Hashimoto, Kei ;
Oura, Keiichiro ;
Nankaku, Yoshihiko ;
Tokuda, Keiichi .
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, :4455-4459
[8]   Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization [J].
Zen, Heiga ;
Braunschweiler, Norbert ;
Buchholz, Sabine ;
Gales, Mark J. F. ;
Knill, Kate ;
Krstulovic, Sacha ;
Latorre, Javier .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (06) :1713-1724
[9]   Speaker Adaptation for Slovak Statistical Parametric Speech Synthesis Based on Hidden Markov Models [J].
Sulir, Martin ;
Juhar, Jozef .
2015 25TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2015, :137-140
[10]   STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING DEEP NEURAL NETWORKS [J].
Zen, Heiga ;
Senior, Andrew ;
Schuster, Mike .
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :7962-7966