Review of F0 modelling and generation in HMM based speech synthesis

被引:0
|
作者
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
statistical speech synthesis; HMM based synthesis; F0; modelling;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.
引用
收藏
页码:599 / 604
页数:6
相关论文
共 50 条
  • [1] PROBABLISTIC MODELLING OF F0 IN UNVOICED REGIONS IN HMM BASED SPEECH SYNTHESIS
    Yu, K.
    Toda, T.
    Gasic, M.
    Keizer, S.
    Mairesse, F.
    Thomson, B.
    Young, S.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3773 - +
  • [2] JOINT MODELLING OF VOICING LABEL AND CONTINUOUS F0 FOR HMM BASED SPEECH SYNTHESIS
    Yu, K.
    Young, S.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4572 - 4575
  • [3] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +
  • [4] Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis
    Ogbureke, Kalu U.
    Cabral, Joao P.
    Carson-Berndsen, Julie
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 67 - 70
  • [5] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
  • [6] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis
    Yu, Kai
    Young, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1071 - 1079
  • [7] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Soong, Frank K.
    Ling, Zhen-Hua
    Dai, Li-Rong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +
  • [8] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Soheil Khorram
    Hossein Sameti
    Simon King
    EURASIP Journal on Advances in Signal Processing, 2015
  • [9] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    King, Simon
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [10] MULTI-LAYER F0 MODELING FOR HMM-BASED SPEECH SYNTHESIS
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Zhang, Bu-Fan
    Dai, Li-Rong
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 129 - 132