Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

被引:4
作者
Patil, Suraj Pandurang [1 ]
Lahudkar, Swapnil Laxman [2 ]
机构
[1] JSPM, Rajarshi Shahu Coll Engn, Pune, Maharashtra, India
[2] JSPM, Imperial Coll Engn & Res, Pune, Maharashtra, India
关键词
Speech Synthesis; Hidden Markov Model; Context-dependent HMM; HMM Toolkit;
D O I
10.1007/s10772-018-09578-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Hidden Markov Model and Deep Neural Networks based Statistical Parametric Speech Synthesis systems, gain a significant attention from researchers because of their flexibility in generating speech waveforms in diverse voice qualities as well as in styles. This paper describes HMM-based speech synthesis system (SPSS) for the Marathi language. In proposed synthesis method, speech parameter trajectories used for synthesis are generated from the trained hidden Markov models (HMM). We have recorded our database of 5300 phonetically balanced Marathi sentences to train the context-dependent HMM with five, seven and nine hidden states. The subjective quality measures (MOS and PWP) shows that the HMMs with seven hidden states are capable of giving an adequate quality of synthesized speech as compared to five state and with less time complexity than seven state HMMs. The contextual features used for experimentation are inclusive of a position of an observed phoneme in a respective syllable, word, and sentence.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 13 条
  • [1] [Anonymous], THESIS
  • [2] Black AW, 2007, INT CONF ACOUST SPEE, P1229
  • [3] Black AW, 2012, INT CONF ACOUST SPEE, P4005, DOI 10.1109/ICASSP.2012.6288796
  • [4] Agreeing to disagree: active learning with noisy labels without crowdsourcing
    Bouguelia, Mohamed-Rafik
    Nowaczyk, Slawomir
    Santosh, K. C.
    Verikas, Antanas
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) : 1307 - 1319
  • [5] Fukada T., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P137, DOI 10.1109/ICASSP.1992.225953
  • [6] Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110
  • [7] Imai S., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P93
  • [8] Leong AS, 2018, SPRBRIEF ELECT, P35, DOI 10.1007/978-3-319-65614-4_3
  • [9] Tokuda K, 2002, IEICE T INF SYST, VE85D, P455
  • [10] Tokuda K., 2002, IEEE WORKSH SPEECH S