Audiovisual representation of prosody in expressive speech communication

被引:34
作者
Granström, B [1 ]
House, D [1 ]
机构
[1] KTH, Ctr Speech Technol, Dept Speech Mus & Hearing, S-10044 Stockholm, Sweden
关键词
audiovisual prosody; multimodal communication; expressive speech; talking heads; animation;
D O I
10.1016/j.specom.2005.02.017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Prosody in a single speaking style-often read speech-has been studied extensively in acoustic speech. During the past few years we have expanded our interest in two directions: (1) Prosody in expressive speech communication and (2) prosody as an audiovisual expression. Understanding the interactions between visual expressions (primarily in the face) and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is for obvious reasons tightly connected to the acoustics (e.g. lip and jaw movements), but there are other articulatory movements that do not show up on the outside of the face. Furthermore, many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. In this presentation we will give some examples of recent work, primarily at KTH, addressing these questions. We will report on methods for the acquisition and modeling of visual and acoustic data, and some evaluation experiments in which audiovisual prosody is tested. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:473 / 484
页数:12
相关论文
共 32 条
[1]  
AGELFORS E, 1999, P AVSP 99 SANT CRUZ, P123
[2]  
[Anonymous], P ESCA WORKSH AUD VI
[3]  
[Anonymous], 2001, P EUROSPEECH 2001
[4]  
[Anonymous], P ICPHS
[5]  
[Anonymous], P ICSLP 2002
[6]  
[Anonymous], P SPEECH PROS 2002 C
[7]  
Bell Linda, 1999, P 6 EUR C SPEECH COM, P1143
[8]  
BESKOW J, 2003, P ICPHS 2003 BARC SP
[9]  
BESKOW J, 2003, THESIS TMH KTH
[10]  
BESKOW J, 2000, P INSTIL 2000