Building Speech Synthesis Systems for Indian Languages

被引:0
作者
Pradhan, Abhijit [1 ]
Prakash, Anusha [1 ]
Shanmugam, S. Aswin [1 ]
Kasthuri, G. R. [1 ]
Krishnan, Raghava [1 ]
Murthy, Hema A. [1 ]
机构
[1] IIT Madras, Dept Comp Sci & Engn, Madras, Tamil Nadu, India
来源
2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2015年
关键词
Indian languages; text-to-speech synthesis; syllable-based speech synthesis; segmentation; unit selection synthesis; statistical parametric synthesis;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone-level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small. A group delay based syllable segmentation semi-automatic tool is discussed. T he tool is semi-automatic as some of the boundaries obtained are inaccurate and have to be manually corrected. Next, a segmentation algorithm that uses both HMM based segmentation and group delay based segmentation, is used to obtain accurate boundaries automatically. T he boundaries obtained are used in the syllable-based synthesiser for unit selection. In the statistical phone-based synthesiser, embedded re-stimation is performed at the phone level. Syllable-based and penta-phone based HMMs are used for building the synthesiser. TTS systems for 12 different Indian languages namely Tamil, Hindi, Marathi, Malayalam, Telugu, Rajasthani, Bengali, Odia, Assamese, Manipuri, Kannada and Gujarati are built using semi-automatic segmentation and synthesisers have been built for 7 Indian languages using automatic segmentation. Evaluation of the semi-automatic segmentation systems indicate that the MOS (mean opinion score) is above 3.0 for most of the languages. Pair comparison tests on semi-automatic vs. automatic segmentation show that automatic segmentation is preferred.
引用
收藏
页数:6
相关论文
共 41 条
[1]  
A. T. Technologies, 2010, AT T NAT VOIC
[2]  
[Anonymous], 2013, PROC 8 ISCA WORKSHOP
[3]  
[Anonymous], 1997, Eurospeech97
[4]  
[Anonymous], THESIS
[5]  
[Anonymous], 2001, DDNEWS
[6]  
Arora K., 2004, INTERSPEECH
[7]  
Arora S., 2005, P INT C SPECOM, P675
[8]  
Bellur A., 2011, IEEE INT C AC SPEECH, p[216, 759]
[9]  
Bellur Ashwin, 2011, NCC, P216
[10]  
Black A., 1998, FESTIVAL SPEECH SYNT