Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis

被引:89
|
作者
Yu, Kai [1 ]
Young, Steve [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期
基金
英国工程与自然科学研究理事会;
关键词
F0; modeling; hidden Markov model (HMM)-based synthesis; statistical parametric speech synthesis; voicing classification; FREQUENCY; SYSTEM;
D O I
10.1109/TASL.2010.2076805
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The modeling of fundamental frequency, or F0, in HMM-based speech synthesis is a critical factor in delivering speech which is both natural and accurately conveys all of the many nuances of the message. However, F0 modeling is difficult because F0 values are normally considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. F0 is therefore a discontinuous function of time. multi-space probability distribution HMM (MSDHMM) is a widely used solution to this problem. The MSDHMM essentially uses a joint distribution of discrete voicing labels and the discontinuous F0 observations. However, due to the discontinuity assumption, the MSDHMM provides a rather weak F0 trajectory model. In this paper, F0 is viewed as being a continuous function of time and this is achieved by assuming that F0 can be observed within unvoiced regions as well as voiced regions. This provides a continuous F0 data stream which can be modeled by standard HMMs. Voicing labels are modeled either implicitly or explicitly in order to perform voicing classification and a globally tied distribution (GTD) technique is used to achieve robust F0 estimation. Both objective measures and subjective listening tests demonstrate that continuous F0 modeling yields better synthesized F0 trajectories and significant improvements to the naturalness of synthesized speech compared to using the MSDHMM model.
引用
收藏
页码:1071 / 1079
页数:9
相关论文
共 50 条
  • [1] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +
  • [2] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
  • [3] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Soong, Frank K.
    Ling, Zhen-Hua
    Dai, Li-Rong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +
  • [4] JOINT MODELLING OF VOICING LABEL AND CONTINUOUS F0 FOR HMM BASED SPEECH SYNTHESIS
    Yu, K.
    Young, S.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4572 - 4575
  • [5] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Soheil Khorram
    Hossein Sameti
    Simon King
    EURASIP Journal on Advances in Signal Processing, 2015
  • [6] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    King, Simon
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [7] MULTI-LAYER F0 MODELING FOR HMM-BASED SPEECH SYNTHESIS
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Zhang, Bu-Fan
    Dai, Li-Rong
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 129 - 132
  • [8] Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (08) : 1406 - 1419
  • [9] CROSS-STREAM DEPENDENCY MODELING USING CONTINUOUS F0 MODEL FOR HMM-BASED SPEECH SYNTHESIS
    Wang, Xin
    Ling, Zhen-Hua
    Dai, Li-Rong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 84 - 87
  • [10] Review of F0 modelling and generation in HMM based speech synthesis
    Yu, Kai
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 599 - 604