The Development of Phone Duration Model in Speech Synthesis in the Serbian Language

被引:0
作者
Sovilj-Nikic, Sandra [1 ]
Sovilj-Nikic, Ivan [1 ]
机构
[1] Univ Novi Sad, Fac Tech Sci, Trg Dositeja Obradov 6, Novi Sad, Serbia
来源
2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR) | 2015年
关键词
duration model; machine learning algorithms; phone duration; REGRESSION;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Having in mind the importance of segmental duration from the perceptual point of view, the possibility of automatic prediction of natural duration of phones is essential for achieving the naturalness of synthetic speech. In this paper various machine learning techniques were used for phone duration modeling of the Serbian language. In this paper different phone duration models for the Serbian language using linear regression, tree-based algorithms and meta-learning algorithms such as additive regression, bagging and stacking algorithm are presented. Phone duration models have been developed for the full phoneme set of the Serbian language as well as for vowels and consonants separately. A large speech corpus and a feature set of 21 parameters describing phones and their contexts were used for segmental duration prediction. These features have been extracted from the large speech database for the Serbian language. The phone duration model obtained using additive regression method outperformed the other models developed for the Serbian language which are also presented in this paper. The results obtained for the full phoneme set as well as for consonants and vowels are comparable with or even outperform those reported in the literature for Czech, Greek, English, Lithuanian, Korean, Turkish and Indian languages Hindi and Telugu.
引用
收藏
页码:693 / 699
页数:7
相关论文
共 28 条
[1]  
Batusek R., 2002, Proc. of Speech Prosody 2002, P167
[2]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1007/BF00058655
[3]  
Breiman L., 1984, CLASSIFICATION REGRE
[4]  
Bulyko I., 1999, Proc. of ICPhS, V1, P81
[5]  
Campbell W. N., 1992, THESIS
[6]   SEGMENTAL DURATIONS IN CONNECTED-SPEECH SIGNALS - CURRENT RESULTS [J].
CRYSTAL, TH ;
HOUSE, AS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (04) :1553-1573
[7]  
Delic V., 2010, ADV SPEECH RECOGNITI, P141
[8]   Bayesian networks for phone duration prediction [J].
Goubanova, Olga ;
King, Simon .
SPEECH COMMUNICATION, 2008, 50 (04) :301-311
[9]  
Guduric S., 2005, Zbornik Matice srpske za filologiju i lingvistiku, V48, P135
[10]  
Hall M., 2009, ACM SIGKDD Explor. Newslett., V11, P10, DOI [DOI 10.1145/1656274.1656278, 10.1145/1656274.1656278]