A survey on speech synthesis techniques in Indian languages

被引:11
作者
Panda, Soumya Priyadarsini [1 ]
Nayak, Ajit Kumar [2 ]
Rai, Satyananda Champati [3 ]
机构
[1] Silicon Inst Technol, Dept CSE, Bhubaneswar, Odisha, India
[2] Siksha O Anusandhan Univ, Dept CS & IT, Bhubaneswar, Odisha, India
[3] Silicon Inst Technol, Dept IT, Bhubaneswar, Odisha, India
关键词
Text to speech system; Speech synthesis; Indian languages; Concatenative synthesis; Formant synthesis; Articulatory synthesis; Syllable-based synthesis; HMM-based synthesis; Statistical parametric synthesis; Polyglot synthesis; Multilingual synthesis; Waveform concatenation; Deep learning; FORM CONCATENATION TECHNIQUE; SYNTHESIS SYSTEM; ARTICULATORY SYNTHESIS; TEXT; SELECTION; INTELLIGIBILITY; FEATURES; QUALITY; ENHANCEMENT; GENERATION;
D O I
10.1007/s00530-020-00659-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The text to speech technology has achieved significant progress during the past decade and is an active area of research and development in providing different human-computer interactive systems. Even though a number of speech synthesis models are available for different languages focusing on the domain requirements with many motive applications, a source of information on current trends in Indian language speech synthesis is unavailable till date making it difficult for the beginners to initiate research for the development of TTS systems for the low-resourced languages. This paper provides a review of the contributions made by different researchers in the field of Indian language speech synthesis along with a study on the Indian language characteristics and the associated challenges in designing TTS systems. A set of available applications and tools results out of different projects undertaken by different organizations along with a set of possible future developments are also discussed to provide a single reference to an important strand of research in speech synthesis which may benefit anyone interested to initiate research in this area.
引用
收藏
页码:453 / 478
页数:26
相关论文
共 178 条
[41]  
Conkie A., 1999, 137 M AC SOC AM, P978
[42]  
Coulston R., 2002, 7 INT C SPOK LANG PR, V4, P2689
[43]   Cognitive factors in the evaluation of synthetic speech [J].
Delogu, C ;
Conte, S ;
Sementina, C .
SPEECH COMMUNICATION, 1998, 24 (02) :153-168
[44]  
Du Qinsheng, 2011, Proceedings of the 2011 IEEE 14th International Conference on Computational Science and Engineering (CSE 2011). 11th International Symposium on Pervasive Systems, Algorithms, Networks (I-SPAN 2011). 10th IEEE International Conference on Ubiquitous Computing and Communications (IUCC 2011), P539, DOI 10.1109/CSE.2011.95
[45]   MBR-PSOLA - TEXT-TO-SPEECH SYNTHESIS BASED ON AN MBE RE-SYNTHESIS OF THE SEGMENTS DATABASE [J].
DUTOIT, T ;
LEICH, H .
SPEECH COMMUNICATION, 1993, 13 (3-4) :435-440
[46]   The Kestrel TTS text normalization system [J].
Ebden, Peter ;
Sproat, Richard .
NATURAL LANGUAGE ENGINEERING, 2015, 21 (03) :333-353
[47]   Statistical parametric speech synthesis for Ibibio [J].
Ekpenyong, Moses ;
Urua, Eno-Abasi ;
Watts, Oliver ;
King, Simon ;
Yamagishi, Junichi .
SPEECH COMMUNICATION, 2014, 56 :243-251
[48]   Trends in Speech and Language Processing [J].
Feng, Junlan ;
Ramabhadran, Bhuvana ;
Hansen, John H. L. ;
Williams, Jason D. .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (01) :177-179
[49]  
FRIES G, 1994, INT CONF ACOUST SPEE, P581
[50]   Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction [J].
Gibson, Matthew ;
Byrne, William .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :895-904