Intensity Modeling for Syllable Based Text-to-Speech Synthesis

被引:0
作者
Reddy, V. Ramu [1 ]
Rao, K. Sreenivasa [1 ]
机构
[1] Indian Inst Technol, Sch Informat Technol, Kharagpur 721302, W Bengal, India
来源
CONTEMPORARY COMPUTING | 2012年 / 306卷
关键词
Syllable intensities; Intensity prediction; LR; CART; FENN; Phonological; Contextual; Positional; Articulatory; Linguistic; Production; Naturalness; Intelligibility;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The quality of text-to-speech (TTS) synthesis systems can be improved by controlling the intensities of speech segments in addition to durations and intonation. This paper proposes linguistic and production constraints for modeling the intensity patterns of sequence of syllables. Linguistic constraints are represented by positional, contextual and phonological features, and production constraints are represented by articulatory features associated to syllables. In this work, feedforward neural network (FFNN) is proposed to model the intensities of syllables. The proposed FFNN model is evaluated by means of objective measures such as average prediction error (mu), standard deviation (sigma), correlation coefficient (gamma X,Y) and the percentage of syllables predicted within different deviations. The prediction performance of the proposed model is compared with other statistical models such as Linear Regression (LR) and Classification and Regression Tree (CART) models. The models are also evaluated by means of subjective listening tests on the synthesized speech generated by incorporating the predicted syllable intensities in Bengali TTS system. From the evaluation studies, it is observed that prediction accuracy is better for FFNN models, compared to other models.
引用
收藏
页码:106 / 117
页数:12
相关论文
共 10 条
  • [1] Haykin S., 1999, Neural Networks: A Comprehensive Foundation, DOI DOI 10.1017/S0269888998214044
  • [2] I. P. Association, 1999, Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet
  • [3] Rules for the generation of ToBI-based American English intonation
    Jilka, M
    Möhler, G
    Dogil, G
    [J]. SPEECH COMMUNICATION, 1999, 28 (02) : 83 - 108
  • [4] Klatt D., 1979, FRONTIERS SPEECH COM, P287
  • [5] Mannel R.H., 2002, P INT C SPEECH SCI, P538
  • [6] Development of syllable-based text to speech synthesis system in Bengali
    Narendra, N.
    Rao, K.
    Ghosh, Krishnendu
    Vempada, Ramu
    Maity, Sudhamay
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (03) : 167 - 181
  • [7] Modeling durations of syllables using neural networks
    Rao, K. Sreenivasa
    Yegnanarayana, B.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02) : 282 - 295
  • [8] Reddy V. R., 2011, 2011 2nd International Conference on Computer and Communication Technology, P334, DOI 10.1109/ICCCT.2011.6075155
  • [9] Capabilities of a four-layered feedforward neural network: Four layers versus three
    Tamura, S
    Tateishi, M
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (02): : 251 - 255
  • [10] Tesser F., 2005, THESIS U TRENTO ITAL