Expressive Speech Animation Synthesis with Phoneme-Level Controls

被引:18
作者
Deng, Z. [1 ]
Neumann, U. [2 ]
机构
[1] Univ Houston, Dept Comp Sci, Comp Graph & Interact Media Lab, Houston, TX 77204 USA
[2] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
关键词
Facial animation; speech animation; data-driven; facial expression; phoneme-isomap; motion capture;
D O I
10.1111/j.1467-8659.2008.01192.x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents a novel data-driven expressive speech animation synthesis system with phoneme-level controls. This system is based on a pre-recorded facial motion capture database, where an actress was directed to recite a pre-designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme-aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best-matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify 'hard constraints' (motion-node constraints for expressing phoneme utterances) and 'soft constraints' (emotion modifiers) to guide this search process. We also introduce a phoneme-Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.
引用
收藏
页码:2096 / 2113
页数:18
相关论文
共 53 条
[1]  
[Anonymous], 1998, P 25 ANN C COMP GRAP
[2]  
[Anonymous], P 22 ANN C COMP GRAP
[3]  
Arikan O, 2002, ACM T GRAPHIC, V21, P483, DOI 10.1145/566570.566606
[4]   Expressive audio-visual speech [J].
Bevacqua, E ;
Pelachaud, C .
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) :297-304
[5]   A morphable model for the synthesis of 3D faces [J].
Blanz, V ;
Vetter, T .
SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, :187-194
[6]   Reanimating faces in images and video [J].
Blanz, V ;
Basso, C ;
Poggio, T ;
Vetter, T .
COMPUTER GRAPHICS FORUM, 2003, 22 (03) :641-650
[7]   Style machines [J].
Brand, M ;
Hertzmann, A .
SIGGRAPH 2000 CONFERENCE PROCEEDINGS, 2000, :183-192
[8]  
Brand M, 1999, COMP GRAPH, P21, DOI 10.1145/311535.311537
[9]  
Bregler C, 1997, P 24 ANN C COMP GRAP, V97, P353, DOI DOI 10.1145/258734.258880
[10]   Natural head motion synthesis driven by acoustic prosodic features [J].
Busso, C ;
Deng, ZG ;
Neumann, U .
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2005, 16 (3-4) :283-290