A POLYNOMIAL SEGMENT MODEL BASED STATISTICAL PARAMETRIC SPEECH SYNTHESIS SYSTEM

被引:0
作者
Sun, Jingwei [1 ]
Ding, Feng [1 ]
Wu, Yahui [1 ]
机构
[1] Nokia Res, Beijing, Peoples R China
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
Hidden Markov Model; Polynomial Segment Model; statistical parametric speech synthesis; mean trajectory;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a statistical parametric speech synthesis system based on the polynomial segment model (PSM). As one of the segmental models for speech signals, PSM explicitly describes the trajectory of the features in a speech segment, and keeps the internal dynamics of the segment. In this work, spectral and excitation parameters are modeled by PSMs simultaneously, while the duration for each segment is modeled by a single Gaussian distribution. A top-down K-means clustering technique is applied for model tying. Mean trajectories acquired from PSMs are used directly to generate speech parameters according to the estimated segment duration. An English speech synthesizer back-end is implemented on CMU Arctic corpus and the performance of the new approach is compared with the classical HMM-based one. Experimental results show that PSM modeling can achieve similar naturalness and intelligence of the synthetic speech as HMM modeling. The system is in the early stage of its development.
引用
收藏
页码:4021 / 4024
页数:4
相关论文
共 50 条
[31]   DEEP BELIEF NETWORK-BASED POST-FILTERING FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS [J].
Hu, Ya-Jun ;
Ling, Zhen-Hua ;
Dai, Li-Rong .
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, :5510-5514
[32]   Emotional transplant in statistical speech synthesis based on emotion additive model [J].
Ohtani, Yaniato ;
Nasu, Yu ;
Morita, Masahiro ;
Akamine, Masami .
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, :274-278
[33]   Emotional Statistical Parametric Speech Synthesis Using LSTM-RNNs [J].
An, Shumin ;
Ling, Zhenhua ;
Dai, Lirong .
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, :1563-1566
[34]   On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis [J].
Maia, Ranniery ;
Akamine, Masami .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (05) :1209-1232
[35]   Excitation modelling using epoch features for statistical parametric speech synthesis [J].
Reddy, M. Kiran ;
Rao, K. Sreenivasa .
COMPUTER SPEECH AND LANGUAGE, 2020, 60
[36]   An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis [J].
Hu, Qiong ;
Stylianou, Yannis ;
Maia, Ranniery ;
Richmond, Korin ;
Yamagishi, Junichi ;
Latorre, Javier .
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, :780-784
[37]   Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis [J].
Maia, Ranniery ;
Akamine, Masami .
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, :1630-1633
[38]   MAHALANOBIS DISTANCE BASED POLYNOMIAL SEGMENT MODEL FOR CHINESE SIGN LANGUAGE RECOGNITON [J].
Zhou, Yu ;
Chen, Xilin ;
Zhao, Debin ;
Yao, Hongxun ;
Gao, Wen .
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, :317-+
[39]   Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis [J].
Nose, Takashi ;
Arao, Yusuke ;
Kobayashi, Takao ;
Sugiura, Komei ;
Shiga, Yoshinori .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) :1107-1116
[40]   Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis [J].
Maia, Ranniery ;
Gales, Mark J. F. ;
Stylianou, Yannis ;
Akamine, Masami .
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :2335-2339