UNIT SELECTION SPEECH SYNTHESIS USING MULTIPLE SPEECH UNITS AT NON-ADJACENT SEGMENTS FOR PROSODY AND WAVEFORM GENERATION

被引:2
作者
Tamura, Masatsune [1 ]
Braunschweiler, Norbert [2 ]
Kagoshima, Takehiko [1 ]
Akamine, Masami [1 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Saiwai Ku, 1 Komukai Toshiba Cho, Kawasaki, Kanagawa 2128582, Japan
[2] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
concatenative speech synthesis; unit selection; prosody generation; unit fusion;
D O I
10.1109/ICASSP.2010.5495151
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a speech synthesis method that combines a natural waveform concatenation based speech synthesis method and our baseline plural unit selection and fusion method. Two main features of the proposed method are (i) prosody regeneration from selected speech units and (ii) using multiple speech units at non-adjacent segments. The non-adjacent segments is the segment that the previous or following speech units in the optimum speech unit sequence are not adjacent in the database. By using the prosody of selected speech units, the original prosodic expressions and sounds of recorded speech are retained, while discontinuities are reduced by using multiple speech units at non-adjacent segments. MOS evaluations showed that the proposed method provides a clear improvement against the conventional unit selection method and our baseline method.
引用
收藏
页码:4802 / 4805
页数:4
相关论文
共 7 条
[1]  
[Anonymous], 1950, ANN I STAT MATH, DOI DOI 10.1007/BF02919500
[2]  
Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110
[3]  
Kagoshima T., 1998, P ICSLP 98 DEC, P1975
[4]   Concatenative speech synthesis based on the plural unit selection and fusion method [J].
Mizutani, T ;
Kagoshima, T .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (11) :2565-2572
[5]  
Syrdal A. K., 2000, P 6 INT C SPOKEN LAN, V3, P410
[6]   Fast concatenative speech synthesis using pre-fused speech units based on the plural unit selection and fusion method [J].
Tamura, Masatsune ;
Mizutani, Tatsuya ;
Kagoshima, Takehiko .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (02) :544-553
[7]  
Toda T, 2002, INT CONF ACOUST SPEE, P465