METHODS FOR APPLYING DYNAMIC SINUSOIDAL MODELS TO STATISTICAL PARAMETRIC SPEECH SYNTHESIS

被引:0
|
作者
Hu, Qiong [1 ]
Stylianou, Yannis [2 ]
Maia, Ranniery [2 ]
Richmond, Korin [1 ]
Yamagishi, Junichi [1 ,3 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[2] Toshiba Res Europe Ltd, Cambridge, England
[3] Natl Inst Informat, Tokyo, Japan
基金
英国工程与自然科学研究理事会;
关键词
Sinusoidal model; Parametric statistical speech synthesis; Discrete cepstra; Quality;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sinusoidal vocoders can generate high quality speech, but they have not been extensively applied to statistical parametric speech synthesis. This paper presents two ways for using dynamic sinusoidal models for statistical speech synthesis, enabling the sinusoid parameters to be modelled in HMM-based synthesis. In the first method, features extracted from a fixed-and low-dimensional, perception-based dynamic sinusoidal model (PDM) are statistically modelled directly. In the second method, we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients) for modelling. During synthesis, HDM is then used to reconstruct speech. We have compared the voice quality of these two methods to the STRAIGHT cepstrum-based vocoder with mixed excitation in formal listening tests. Our results show that HDM with intermediate parameters can generate comparable quality as STRAIGHT, while PDM direct modelling seems promising in terms of producing good speech quality without resorting to intermediate parameters such as cepstra.
引用
收藏
页码:4889 / 4893
页数:5
相关论文
共 50 条
  • [1] An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis
    Hu, Qiong
    Stylianou, Yannis
    Maia, Ranniery
    Richmond, Korin
    Yamagishi, Junichi
    Latorre, Javier
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 780 - 784
  • [2] Autoregressive Models for Statistical Parametric Speech Synthesis
    Shannon, Matt
    Zen, Heiga
    Byrne, William
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 587 - 597
  • [3] A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 11 - 20
  • [4] A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis
    Airaksinen, Manu
    Juvela, Lauri
    Bollepalli, Bajibabu
    Yamagishi, Junichi
    Alku, Paavo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1658 - 1670
  • [5] Statistical parametric speech synthesis
    Black, Alan W.
    Zen, Heiga
    Tokuda, Keiichi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1229 - +
  • [6] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064
  • [7] IMPLEMENTATION AND EVALUATION OF STATISTICAL PARAMETRIC SPEECH SYNTHESIS METHODS FOR THE PERSIAN LANGUAGE
    Bahaadini, Sara
    Sameti, Hossein
    Khorram, Soheil
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [8] Statistical parametric speech synthesis for Ibibio
    Ekpenyong, Moses
    Urua, Eno-Abasi
    Watts, Oliver
    King, Simon
    Yamagishi, Junichi
    SPEECH COMMUNICATION, 2014, 56 : 243 - 251
  • [9] An introduction to statistical parametric speech synthesis
    King, Simon
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 837 - 852
  • [10] An introduction to statistical parametric speech synthesis
    Simon King
    Sadhana, 2011, 36 : 837 - 852