paGAN: Real-time Avatars Using Dynamic Textures

被引:0
作者
Nagano, Koki [1 ,2 ]
Seo, Jaewoo [1 ]
Xing, Jun [2 ]
Wei, Lingyu [1 ]
Li, Zimo [3 ]
Saito, Shunsuke [1 ,3 ]
Agarwal, Aviral [1 ]
Fursund, Jens [1 ]
Li, Hao [1 ,2 ,3 ]
机构
[1] Pinscreen, Santa Monica, CA 90401 USA
[2] USC Inst Creat Technol, Los Angeles, CA 90094 USA
[3] Univ Southern Calif, Los Angeles, CA USA
来源
SIGGRAPH ASIA'18: SIGGRAPH ASIA 2018 TECHNICAL PAPERS | 2018年
关键词
Digital avatar; Texture synthesis; Image-based rendering; Generative adversarial network; Facial animation; DATABASE; FACES;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.
引用
收藏
页数:12
相关论文
共 49 条
[1]  
Alexander Oleg, 2013, ACM SIGGRAPH 2013 PO, P1
[2]  
Amberg B, 2008, INT C AUTOMATIC FACE, P1
[3]  
[Anonymous], 2017, ARXIV171203474
[4]  
[Anonymous], 2017, ARXIV170903842
[5]  
[Anonymous], 2015, ARXIV151102683
[6]  
[Anonymous], ECCV
[7]  
[Anonymous], 2017, arXiv
[8]  
[Anonymous], PRACTICAL APPEARANCE
[9]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[10]  
[Anonymous], 2015, arXiv