PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

被引：881

作者：

Saito, Shunsuke ^{[1
,2
]}

Huang, Zeng ^{[1
,2
]}

Natsume, Ryota ^{[3
]}

Morishima, Shigeo ^{[3
]}

Kanazawa, Angjoo ^{[4
]}

Li, Hao ^{[1
,2
,5
]}

机构：

[1] Univ Southern Calif, Los Angeles, CA 90007 USA

[2] USC Inst Creat Technol, Los Angeles, CA 90007 USA

[3] Waseda Univ, Tokyo, Japan

[4] Univ Calif Berkeley, Berkeley, CA 94720 USA

[5] Pinscreen, Los Angeles, CA USA

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

CAPTURE; VIDEO; SHAPE; POSE;

D O I：

10.1109/ICCV.2019.00239

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. Highly intricate shapes, such as hairstyles, clothing, as well as their variations and deformations can be digitized in a unified way. Compared to existing representations used for 3D deep learning, PIFu produces high-resolution surfaces including largely unseen regions such as the back of a person. In particular, it is memory efficient unlike the voxel representation, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. Furthermore, while previous techniques are designed to process either a single image or multiple views, PIFu extends naturally to arbitrary number of views. We demonstrate high-resolution and robust reconstructions on real world images from the DeepFashion dataset, which contains a variety of challenging clothing types. Our method achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.

引用

页码：2304 / 2314

页数：11

共 64 条

[1] Learning to Reconstruct People in Clothing from a Single RGB Camera [J].

Alldieck, Thiemo ;

Magnor, Marcus ;

Bhatnagar, Bharat Lal ;

Theobalt, Christian ;

Pons-Moll, Gerard .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1175-1186

[2] Video Based Reconstruction of 3D People Models [J].

Alldieck, Thiemo ;

Magnor, Marcus ;

Xu, Weipeng ;

Theobalt, Christian ;

Pons-Moll, Gerard .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8387-8397

[3] Detailed Human Avatars from Monocular Video [J].

Alldieck, Thiemo ;

Magnor, Marcus ;

Xu, Weipeng ;

Theobalt, Christian ;

Pons-Moll, Gerard .

2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, :98-109

[4] SCAPE: Shape Completion and Animation of People [J].

Anguelov, D ;

Srinivasan, P ;

Koller, D ;

Thrun, S ;

Rodgers, J ;

Davis, J .

ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (03) :408-416

[5]

[Anonymous], 2019, ARXIV190105103

[6]

[Anonymous], 2018, ARXIV181202246

[7]

[Anonymous], 2018, ACM T GRAPH, DOI DOI 10.1145/3137609

[8]

[Anonymous], 2017, P IEEE C COMP VIS PA

[9]

[Anonymous], 2018, P EUR C COMP VIS

[10]

[Anonymous], 2017, P IEEE C COMP VIS PA

← 1 2 3 4 5 6 7 →