paGAN: Real-time Avatars Using Dynamic Textures

被引:0
作者
Nagano, Koki [1 ,2 ]
Seo, Jaewoo [1 ]
Xing, Jun [2 ]
Wei, Lingyu [1 ]
Li, Zimo [3 ]
Saito, Shunsuke [1 ,3 ]
Agarwal, Aviral [1 ]
Fursund, Jens [1 ]
Li, Hao [1 ,2 ,3 ]
机构
[1] Pinscreen, Santa Monica, CA 90401 USA
[2] USC Inst Creat Technol, Los Angeles, CA 90094 USA
[3] Univ Southern Calif, Los Angeles, CA USA
来源
SIGGRAPH ASIA'18: SIGGRAPH ASIA 2018 TECHNICAL PAPERS | 2018年
关键词
Digital avatar; Texture synthesis; Image-based rendering; Generative adversarial network; Facial animation; DATABASE; FACES;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.
引用
收藏
页数:12
相关论文
共 49 条
[31]  
Fu C.-W., 2010, ACM T GRAPHIC, V29, DOI [DOI 10.1145/1833349.1778769, DOI 10.1145/1778765.1778769]
[32]   Reconstruction of Personalized 3D Face Rigs from Monocular Video [J].
Garrido, Pablo ;
Zollhoefer, Michael ;
Casas, Dan ;
Valgaerts, Levi ;
Varanasi, Kiran ;
Perez, Patrick ;
Theobalt, Christian .
ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (03)
[33]  
Hsieh PL, 2015, PROC CVPR IEEE, P1675, DOI 10.1109/CVPR.2015.7298776
[34]  
Huynh Loc, 2018, IEEE C COMP VIS PATT, P2
[35]  
Karras T., 2018, 6 INT C LEARN REPR
[36]  
Kingma D.P., 2014, INT C LEARN REP
[37]   Presentation and validation of the Radboud Faces Database [J].
Langner, Oliver ;
Dotsch, Ron ;
Bijlstra, Gijsbert ;
Wigboldus, Daniel H. J. ;
Hawk, Skyler T. ;
van Knippenberg, Ad .
COGNITION & EMOTION, 2010, 24 (08) :1377-1388
[38]   Structure-Aware Hair Capture [J].
Luo, Linjie ;
Li, Hao ;
Rusinkiewicz, Szymon .
ACM TRANSACTIONS ON GRAPHICS, 2013, 32 (04)
[39]   The Chicago face database: A free stimulus set of faces and norming data [J].
Ma, Debbie S. ;
Correll, Joshua ;
Wittenbrink, Bernd .
BEHAVIOR RESEARCH METHODS, 2015, 47 (04) :1122-1135
[40]  
Saito S., 2016, ECCV