Statistical prosodic modeling: From corpus design to parameter estimation

被引:31
作者
Bellegarda, JR [1 ]
Silverman, KEA [1 ]
Lenzo, K [1 ]
Anderson, V [1 ]
机构
[1] Apple Comp Inc, Spoken Language Grp, Cupertino, CA 95014 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 01期
关键词
intonation modeling; prosodic representation; prosody generation; speech database design and collection; text-to-speech systems;
D O I
10.1109/89.890071
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, recently created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts. each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contests, reiterant speech, function word sequences, and continuous speech. This paper focuses on the use of the Victoria corpus in the statistical estimation of duration and pitch models for Apple's next-generation test-to-speech system in Macintosh OS X. Duration modeling relies primarily on the subcorpus of prosodic contexts, which is instrumental tb uncover empirical evidence in favor of a piece-wise linear transformation in the well-known sums-of-products approach. Pitch modeling relies primarily on the subcorpus of reiterant speech, which makes possible the optimization of superpositional pitch models with more accurate underlying smooth contours. Experimental results illustrate the improved prosodic representation resulting from these new duration and pitch models.
引用
收藏
页码:52 / 66
页数:15
相关论文
共 34 条
[1]  
ANDERSON M, 1984, P IEEE, V72
[2]   Learning to speak. Sensori-motor control of speech movements [J].
Bailly, G .
SPEECH COMMUNICATION, 1997, 22 (2-3) :251-267
[3]  
BECKMAN M, 1995, P INT C PHON SCI, P100
[4]  
BELLEGARDA JR, 1998, P INT C SPOK LANG PR, P21
[5]  
CAMPBELL WN, 1997, SYNTHESIZING SPONTAN, P165
[6]  
Clark R., 1999, P 6 EUR C SPEECH COM, V4, P1623
[7]  
COOPER F, 1993, PRODUCTION SPEECH, P275
[8]  
DUSTERHOFF KE, 1999, P 6 EUR C SPEECH COM, V4, P1627
[9]   Articulatory strengthening at edges of prosodic domains [J].
Fougeron, C ;
Keating, PA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 101 (06) :3728-3740
[10]  
Friberg A., 1995, THESIS ROYAL I TECHN