Head and facial gestures synthesis using PAD model for an expressive talking avatar

被引：23

作者：

Jia, Jia ^{[1
,2
]}

Wu, Zhiyong ^{[3
,4
]}

Zhang, Shen ^{[1
,2
]}

Meng, Helen M. ^{[3
,4
]}

Cai, Lianhong ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Minist Educ China, Key Lab Pervas Comp, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[3] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China

[4] Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Shenzhen 518055, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2014年 / 73卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Text-to-visual-speech; Head motion; Facial expression; Talking avatar; SPEECH SYNTHESIS; ANIMATION; DRIVEN; MOVEMENT; EMOTION; PROSODY; MOTION;

D O I：

10.1007/s11042-013-1604-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes to synthesize expressive head and facial gestures on talking avatar using the three dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) descriptors of semantic expressivity. The PAD model is adopted to bridge the gap between text semantics and visual motion features with three dimensions of pleasure-displeasure, arousal-nonarousal, and dominance-submissiveness. Based on the correlation analysis between PAD annotations and motion patterns derived from the head and facial motion database, we propose to build an explicit mapping from PAD descriptors to facial animation parameters with linear regression and neural networks for head motion and facial expression respectively. A PAD-driven talking avatar in text-to-visual-speech system is implemented by generating expressive head motions at the prosodic word level based on the (P, A) descriptors of lexical appraisal, and facial expressions at the sentence level according to the PAD descriptors of emotional information. A series of PAD reverse evaluation and comparative perceptual experiments shows that the head and facial gestures synthesized based on PAD model can significantly enhance the visual expressivity of talking avatar.

引用

页码：439 / 461

页数：23

共 43 条

[1]

Albrecht I., 2005, J VIRTUAL REALITY, V8, P201, DOI DOI 10.1007/S10055-005-0153-5

[2]

[Anonymous], 2002, FACS INVESTIGATORS G

[3]

Boukricha H, 2009, 3 INT C AFF COMP INT, P21

[4] MEASURING EMOTION - THE SELF-ASSESSMENT MANNEQUIN AND THE SEMANTIC DIFFERENTIAL [J].

BRADLEY, MM ;

LANG, PJ .

JOURNAL OF BEHAVIOR THERAPY AND EXPERIMENTAL PSYCHIATRY, 1994, 25 (01) :49-59

[5] Natural head motion synthesis driven by acoustic prosodic features [J].

Busso, C ;

Deng, ZG ;

Neumann, U .

COMPUTER ANIMATION AND VIRTUAL WORLDS, 2005, 16 (3-4) :283-290

[6] Interrelation between speech and facial gestures in emotional utterances: A single subject study [J].

Busso, Carlos ;

Narayanan, Shrikanth S. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2331-2347

[7] Rigid head motion in expressive speech animation: Analysis and synthesis [J].

Busso, Carlos ;

Deng, Zhigang ;

Grimm, Michael ;

Neumann, Ulrich ;

Narayanan, Shrikanth .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03) :1075-1086

[8] Mood swings: Expressive speech animation [J].

Chuang, E ;

Bregler, C .

ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (02) :331-347

[9] Lifelike talking faces for interactive services [J].

Cosatto, E ;

Ostermann, J ;

Graf, HP ;

Schroeter, J .

PROCEEDINGS OF THE IEEE, 2003, 91 (09) :1406-1429

[10]

Deng Z., 2006, Proc. of ACM SIGGGRAPH/Eurographics Symposium on Computer Animation, P251

← 1 2 3 4 5 →