Sample-based synthesis of photo-realistic talking heads

被引：47

作者：

Cosatto, E ^{[1
]}

Graf, HP ^{[1
]}

机构：

[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA

来源：

COMPUTER ANIMATION 98 - PROCEEDINGS | 1998年

关键词：

talking-head synthesis; sample-based synthesis; photo-realistic rendering; face recognition and location; sample-based coarticulation;

D O I：

10.1109/CA.1998.681914

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a system that generates photo-realistic video animations of talking heads. First the system derives head models from existing video footage using image recognition techniques. It locates, extracts and labels facial parts such as mouth, eyes, and eyebrows into a compact library. Then, using these face models and a text-to-speech synthesizer, it synthesizes new video sequences of the head where the lips are in synchrony with the accompanying soundtrack. Emotional cues and conversational signals are produced by combining head movements, raising eyebrows, wide open eyes, etc. with the mouth animation. For these animations to be believable, care has to be taken aligning the facial parts so that they blend smoothly into each other and produce seamless animations. Our system uses precise multi-channel facial recognition techniques to track facial parts, and it derives the exact 3D position of the head, enabling the automatic extraction of normalized face parts. Such talking-head animations are useful because they generally increase intelligibility of the human-machine interface in applications where content needs to be narrated to the user, such as educative software.

引用

页码：103 / 110

页数：8