Tree-based Phone Duration Modelling of the Serbian Language

被引:6
作者
Sovilj-Nikic, S. [1 ]
Delic, V. [1 ]
Sovilj-Nikic, I. [1 ]
Markovic, M. [2 ]
机构
[1] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
[2] Univ Novi Sad, Fac Philosophy, Novi Sad 21000, Serbia
关键词
Decision trees; machine learning algorithms; speech; speech synthesis; SEGMENTAL DURATION; SPEECH;
D O I
10.5755/j01.eee.20.3.4090
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Considering the importance of segmental duration from a perceptive point of view, the possibility of automatic prediction of natural duration of phones is essential for achieving the naturalness of synthesized speech. In this paper phone duration prediction model for the Serbian language using tree-based machine learning approach is presented. A large speech corpus and a feature set of 21 parameters describing phones and their contexts were used for segmental duration prediction. Phone duration modelling is based on attributes such as the current segment identity, preceding and following segment types, manner of articulation (for consonants) and voicing of neighbouring phones, lexical stress, part-of-speech, word length, the position of the segment in the syllable, the position of the syllable in a word, the position of a word in a phrase, phrase break level, etc. These features have been extracted from the large speech database for the Serbian language. The results obtained for the full phoneme set using regression tree, RMSE (root-mean-squared-error) 14.8914 ms, MAE (mean absolute error) 11.1947 ms and correlation coefficient 0.8796 are comparable with those reported in the literature for Czech, Greek, Lithuanian, Korean, Indian languages Hindi and Telugu, Turkish.
引用
收藏
页码:77 / 82
页数:6
相关论文
共 26 条
[1]  
[Anonymous], P 5 ISCA SPEECH SYNT
[2]  
Batusek R., 2002, Proc. of Speech Prosody 2002, P167
[3]  
Breiman L., 1984, CLASSIFICATION REGRE
[4]  
Bulyko I., 1999, Proc. of ICPhS, V1, P81
[5]   SEGMENTAL DURATIONS IN CONNECTED-SPEECH SIGNALS - CURRENT RESULTS [J].
CRYSTAL, TH ;
HOUSE, AS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (04) :1553-1573
[6]   Discrimination Capability of Prosodic and Spectral Features for Emotional Speech Recognition [J].
Delic, V. ;
Bojanic, M. ;
Gnjatovic, M. ;
Secujski, M. ;
Jovicic, S. T. .
ELEKTRONIKA IR ELEKTROTECHNIKA, 2012, 18 (09) :51-54
[7]  
Delic V., 2010, ADV SPEECH RECOGNITI, P141
[8]   Segmentation Analysis using Synthetic Speech Signals [J].
Greibus, M. ;
Telksnys, L. .
ELEKTRONIKA IR ELEKTROTECHNIKA, 2012, 18 (08) :57-60
[9]  
Guduric S., 2005, Zbornik Matice srpske za filologiju i lingvistiku, V48, P135
[10]  
Hall M., 2009, SIGKDD Explorations, V11, P10, DOI DOI 10.1145/1656274.1656278