Developments in corpus-based speech synthesis: Approaching natural conversational speech

被引:20
作者
Campbell, N [1 ]
机构
[1] ATR Network Informat Labs, Dept Emergent Commun, Kyoto 6190288, Japan
关键词
speech synthesis; corpora; concatenation; paralinguistic information; communication; affect;
D O I
10.1093/ietisy/e88-d.3.376
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.
引用
收藏
页码:376 / 383
页数:8
相关论文
共 41 条
[1]   INTONATION IN TEXT-TO-SPEECH SYNTHESIS - EVALUATION OF ALGORITHMS [J].
AKERS, G ;
LENNIG, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 77 (06) :2157-2165
[2]   SYNTHESIS OF SPEECH FROM UNRESTRICTED TEXT [J].
ALLEN, J .
PROCEEDINGS OF THE IEEE, 1976, 64 (04) :433-442
[3]   Macrofilaricides and onchocerciasis control, mathematical modelling of the prospects for elimination [J].
Alley, WS ;
van Oortmarssen, GGJ ;
Boatin, BBA ;
Nagelkerke, NNJD ;
Plaisier, AAP ;
Remme, HJ ;
Lazdins, J ;
Borsboom, GJJM ;
Habbema, JDF .
BMC PUBLIC HEALTH, 2001, 1 (1) :1-5
[4]  
[Anonymous], 1997, INTRO TEXT SPEECH SY
[5]  
[Anonymous], 1987, From Text to Speech: the MITalk System
[6]  
[Anonymous], P 15 ICPHS ICPHS 03
[7]  
[Anonymous], P INT ICSLP JEJ ISL
[8]  
Ao B., 1994, ICSLP 94. 1994 International Conference on Spoken Language Processing, P1771
[9]  
CAMPBELL N, 2004, J PHONETIC SOC JAPAN, V7, P9
[10]  
Campbell N., 2004, P LANG RES EV C LREC