PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

被引:881
作者
Saito, Shunsuke [1 ,2 ]
Huang, Zeng [1 ,2 ]
Natsume, Ryota [3 ]
Morishima, Shigeo [3 ]
Kanazawa, Angjoo [4 ]
Li, Hao [1 ,2 ,5 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] USC Inst Creat Technol, Los Angeles, CA 90007 USA
[3] Waseda Univ, Tokyo, Japan
[4] Univ Calif Berkeley, Berkeley, CA 94720 USA
[5] Pinscreen, Los Angeles, CA USA
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
CAPTURE; VIDEO; SHAPE; POSE;
D O I
10.1109/ICCV.2019.00239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. Highly intricate shapes, such as hairstyles, clothing, as well as their variations and deformations can be digitized in a unified way. Compared to existing representations used for 3D deep learning, PIFu produces high-resolution surfaces including largely unseen regions such as the back of a person. In particular, it is memory efficient unlike the voxel representation, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. Furthermore, while previous techniques are designed to process either a single image or multiple views, PIFu extends naturally to arbitrary number of views. We demonstrate high-resolution and robust reconstructions on real world images from the DeepFashion dataset, which contains a variety of challenging clothing types. Our method achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.
引用
收藏
页码:2304 / 2314
页数:11
相关论文
共 64 条
[1]   Learning to Reconstruct People in Clothing from a Single RGB Camera [J].
Alldieck, Thiemo ;
Magnor, Marcus ;
Bhatnagar, Bharat Lal ;
Theobalt, Christian ;
Pons-Moll, Gerard .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1175-1186
[2]   Video Based Reconstruction of 3D People Models [J].
Alldieck, Thiemo ;
Magnor, Marcus ;
Xu, Weipeng ;
Theobalt, Christian ;
Pons-Moll, Gerard .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8387-8397
[3]   Detailed Human Avatars from Monocular Video [J].
Alldieck, Thiemo ;
Magnor, Marcus ;
Xu, Weipeng ;
Theobalt, Christian ;
Pons-Moll, Gerard .
2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, :98-109
[4]   SCAPE: Shape Completion and Animation of People [J].
Anguelov, D ;
Srinivasan, P ;
Koller, D ;
Thrun, S ;
Rodgers, J ;
Davis, J .
ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (03) :408-416
[5]  
[Anonymous], 2019, ARXIV190105103
[6]  
[Anonymous], 2018, ARXIV181202246
[7]  
[Anonymous], 2018, ACM T GRAPH, DOI DOI 10.1145/3137609
[8]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[9]  
[Anonymous], 2018, P EUR C COMP VIS
[10]  
[Anonymous], 2017, P IEEE C COMP VIS PA