ECON: Explicit Clothed humans Optimized via Normal integration

被引:72
作者
Xiu, Yuliang [1 ]
Yang, Jinlong [1 ]
Cao, Xu [2 ]
Tzionas, Dimitrios [3 ]
Black, Michael J. [1 ]
机构
[1] Max Planck Inst Intelligent Syst, Tubingen, Germany
[2] Osaka Univ, Osaka, Japan
[3] Univ Amsterdam, Amsterdam, Netherlands
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
MODEL;
D O I
10.1109/CVPR52729.2023.00057
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The combination of deep learning, artist-curated scans, and Implicit Functions (IF), is enabling the creation of detailed, clothed, 3D humans from images. However, existing methods are far from perfect. IF-based methods recover free-form geometry, but produce disembodied limbs or degenerate shapes for novel poses or clothes. To increase robustness for these cases, existing work uses an explicit parametric body model to constrain surface reconstruction, but this limits the recovery of free-form surfaces such as loose clothing that deviates from the body. What we want is a method that combines the best properties of implicit representation and explicit body regularization. To this end, we make two key observations: (1) current networks are better at inferring detailed 2D maps than full-3D surfaces, and (2) a parametric model can be seen as a "canvas" for stitching together detailed surface patches. Based on these, our method, ECON, has three main steps: (1) It infers detailed 2D normal maps for the front and back side of a clothed person. (2) From these, it recovers 2.5D front and back surfaces, called d-BiNI, that are equally detailed, yet incomplete, and registers these w.r.t. each other with the help of a SMPL-X body mesh recovered from the image. (3) It "inpaints" the missing geometry between d-BiNI surfaces. If the face and hands are noisy, they can optionally be replaced with the ones of SMPL-X. As a result, ECON infers high-fidelity 3D humans even in loose clothes and challenging poses. This goes beyond previous methods, according to the quantitative evaluation on the CAPE and Renderpeople datasets. Perceptual studies also show that ECON's perceived realism is better by a large margin. Code and models are available for research purposes at econ.is.tue.mpg.de
引用
收藏
页码:512 / 523
页数:12
相关论文
共 100 条
[71]   PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization [J].
Saito, Shunsuke ;
Huang, Zeng ;
Natsume, Ryota ;
Morishima, Shigeo ;
Kanazawa, Angjoo ;
Li, Hao .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2304-2314
[72]   FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second [J].
Smith, David ;
Loper, Matthew ;
Hu, Xiaochen ;
Mavroidis, Paris ;
Romero, Javier .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5329-5338
[73]   Monocular, One-stage, Regression of Multiple 3D People [J].
Sun, Yu ;
Bao, Qian ;
Liu, Wu ;
Fu, Yili ;
Black, Michael J. ;
Mei, Tao .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11159-11168
[74]  
Sun Yu, 2022, COMPUTER VISION PATT
[75]  
Sun Yu, 2023, Computer Vision and Pattern Recognition
[76]   3D Human Pose Estimation via Intuitive Physics [J].
Tripathi, Shashank ;
Mueller, Lea ;
Huang, Chun-Hao P. ;
Taheri, Omid ;
Black, Michael J. ;
Tzionas, Dimitrios .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :4713-4725
[77]  
Wang Y, 2018, ADV NEUR IN, V31
[78]  
Weitz Andrew, 2021, C NEUR INF PROC SYST
[79]   Any-Shot GIN: Generalizing Implicit Networks for Reconstructing Novel Classes [J].
Xian, Yongqin ;
Chibane, Julian ;
Bhatnagar, Bharat Lal ;
Schiele, Bernt ;
Akata, Zeynep ;
Pons-Moll, Gerard .
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, :526-535
[80]   MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video [J].
Xiang, Donglai ;
Prada, Fabian ;
Wu, Chenglei ;
Hodgins, Jessica .
2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, :322-332