Head and facial gestures synthesis using PAD model for an expressive talking avatar

被引:23
作者
Jia, Jia [1 ,2 ]
Wu, Zhiyong [3 ,4 ]
Zhang, Shen [1 ,2 ]
Meng, Helen M. [3 ,4 ]
Cai, Lianhong [1 ,2 ]
机构
[1] Tsinghua Univ, Minist Educ China, Key Lab Pervas Comp, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[4] Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Text-to-visual-speech; Head motion; Facial expression; Talking avatar; SPEECH SYNTHESIS; ANIMATION; DRIVEN; MOVEMENT; EMOTION; PROSODY; MOTION;
D O I
10.1007/s11042-013-1604-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes to synthesize expressive head and facial gestures on talking avatar using the three dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) descriptors of semantic expressivity. The PAD model is adopted to bridge the gap between text semantics and visual motion features with three dimensions of pleasure-displeasure, arousal-nonarousal, and dominance-submissiveness. Based on the correlation analysis between PAD annotations and motion patterns derived from the head and facial motion database, we propose to build an explicit mapping from PAD descriptors to facial animation parameters with linear regression and neural networks for head motion and facial expression respectively. A PAD-driven talking avatar in text-to-visual-speech system is implemented by generating expressive head motions at the prosodic word level based on the (P, A) descriptors of lexical appraisal, and facial expressions at the sentence level according to the PAD descriptors of emotional information. A series of PAD reverse evaluation and comparative perceptual experiments shows that the head and facial gestures synthesized based on PAD model can significantly enhance the visual expressivity of talking avatar.
引用
收藏
页码:439 / 461
页数:23
相关论文
共 43 条
[21]   Computing 3-D head orientation from a monocular image sequence [J].
Horprasert, T ;
Yacoob, Y ;
Davis, LS .
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, 1996, :242-247
[22]   Emotional Audio-Visual Speech Synthesis Based on PAD [J].
Jia, Jia ;
Zhang, Shen ;
Meng, Fanbo ;
Wang, Yongxin ;
Cai, Lianhong .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03) :570-582
[23]  
Kanade T., 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), P46, DOI 10.1109/AFGR.2000.840611
[24]  
Lance B, 2007, LECT NOTES ARTIF INT, V4722, P72
[25]  
Li XM, 2005, LECT NOTES COMPUT SC, V3784, P513
[26]  
Lipori G, 2005, MANUAL ANNOTATIONS F
[27]  
Mana N., 2006, P 8 INT C MULTIMODAL, P380
[28]   Pleasure arousal dominance: A general framework for describing and measuring individual differences in temperament [J].
Mehrabian, A .
CURRENT PSYCHOLOGY, 1996, 14 (04) :261-292
[29]  
Mehrabian A., 1972, Nonverbal communication
[30]  
Motion Pictures Expert G, 1999, 144962 ISOIEC