A waveform concatenation technique for text-to-speech synthesis

被引:8
作者
Panda S.P. [1 ]
Nayak A.K. [2 ]
机构
[1] Department of CSE, Silicon Institute of Technology, Bhubaneswar, Odisha
[2] Department of CS&IT, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha
关键词
Concatenative technique; Indian languages; Speech synthesis; Text-to-speech system; Waveform concatenation;
D O I
10.1007/s10772-017-9463-8
中图分类号
学科分类号
摘要
Designing text-to-speech systems capable of producing natural sounding speech segments in different Indian languages is a challenging and ongoing problem. Due to the large number of possible pronunciations in different Indian languages, a number of speech segments are needed to be stored in the speech database while a concatenative speech synthesis technique is used to achieve highly natural speech segments. However, the large speech database size makes it unusable for small hand held devices or human computer interactive systems with limited storage resources. In this paper, we proposed a fraction-based waveform concatenation technique to produce intelligible speech segments from a small footprint speech database. The results of all the experiments performed shows the effectiveness of the proposed technique in producing intelligible speech segments in different Indian languages even with very less storage and computation overhead compared to the existing syllable-based technique. © 2017, Springer Science+Business Media, LLC.
引用
收藏
页码:959 / 976
页数:17
相关论文
共 41 条
[1]  
Adell J., Escudero D., Bonafonte A., Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence, Speech Communication, 54, 3, pp. 459-476, (2012)
[2]  
Alias F., Formiga L., Llora X., Efficient and reliable perceptual weight tuning for unit-selection text-to-speech synthesis based on active interactive genetic algorithms: A proof-of-concept, Speech Communication, 53, 5, pp. 786-800, (2011)
[3]  
Bellur A., Narayan K.B., Krishnan K.R., Murthy H., Prosody modeling for syllable-based concatenative speech synthesis of Hindi and Tamil, In IEEE National conference on communications (NCC) (pp, pp. 1-5, (2011)
[4]  
Benoit C., Le Goff B., Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP, Speech Communication, 26, 1, pp. 117-129, (1998)
[5]  
Black A., Tokuda K., The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases, (2005)
[6]  
Black A.W., Taylor P.A., Automatically clustering similar units for unit selection in speech synthesis, (1997)
[7]  
Cai M.Q., Ling Z.H., Dai L.R., Statistical parametric speech synthesis using a hidden trajectory model, Speech Communication, 72, pp. 149-159, (2015)
[8]  
Christiansen C., Pedersen M.S., Dau T., Prediction of speech intelligibility based on an auditory preprocessing model, Speech Communication, 52, 7-8, pp. 678-692, (2010)
[9]  
Handley Z., Is text-to-speech synthesis ready for use in computer-assisted language learning?, Speech Communication, 51, 10, pp. 906-919, (2009)
[10]  
Hunt A.J., Black A.W., Unit selection in a concatenative speech synthesis system using a large speech database, IEEE International conference on acoustics, speech, and signal processing, pp. 373-376, (1996)