Depth-Based 3D Face Reconstruction and Pose Estimation Using Shape-Preserving Domain Adaptation

被引:6
作者
Zhong Y. [1 ]
Pei Y. [1 ]
Li P. [1 ]
Guo Y. [2 ]
Ma G. [1 ]
Liu M. [3 ]
Bai W. [4 ]
Wu W. [4 ]
Zha H. [1 ]
机构
[1] Department of Machine Intelligence, Key Laboratory of Machine Perception (MOE), Peking University, Beijing
[2] Department of Computer Science, Luoyang Institute of Science and Technology, Luoyang
[3] Department of It, USens Incorporation, San Jose, 95110, CA
[4] Department of It, Huawei Technologies Company Ltd., Beijing
来源
Pei, Yuru (peiyuru@cis.pku.edu.cn) | 1600年 / Institute of Electrical and Electronics Engineers Inc.卷 / 03期
关键词
Depth-based face reconstruction; pose estimation; shape code regression; shape-preserving domain adaptation;
D O I
10.1109/TBIOM.2020.3025466
中图分类号
学科分类号
摘要
Depth images are widely used in 3D head pose estimation and face reconstruction. The device-specific noise and the lack of textual constraints pose a major problem for estimating a nonrigid deformable face from a single noisy depth image. In this article, we present a deep neural network-based framework to infer a 3D face consistent with a single depth image captured by a consumer depth camera Kinect. Confronted with a lack of annotated depth images with facial parameters, we utilize the bidirectional CycleGAN-based generator for denoising and noisy image simulation, which helps to generalize the model learned from synthetic depth images to real noisy ones. We generate the code regressors in the source (synthetic) and the target (noisy) depth image domains and present a fusion scheme in the parametric space for 3D face inference. The proposed multi-level shape consistency constraint, concerning the embedded features, depth maps, and 3D surfaces, couples the code regressor and the domain adaptation, avoiding shape distortions in the CycleGAN-based generators. Experiments demonstrate that the proposed method is effective in depth-based 3D head pose estimation and expressive face reconstruction compared with the state-of-the-art. © 2019 IEEE.
引用
收藏
页码:6 / 15
页数:9
相关论文
共 56 条
[21]  
Booth J., Roussos A., Zafeiriou S., Ponniah A., Dunaway D., A 3D morphable model learnt from 10, 000 faces, Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 5543-5552, (2016)
[22]  
Garrido P., Et al., Reconstruction of personalized 3D face rigs from monocular video, ACM Trans. Graph, 35, 3, (2016)
[23]  
Jin X., Tan X., Face alignment in-the-wild: A survey, Comput. Vis. Image Understand, 162, pp. 1-22, (2017)
[24]  
Tewari A., Et al., MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction, Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2, (2017)
[25]  
Tewari A., Et al., Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz, Proc. IEEE Conf. Comput. Vis. Pattern Recognit, pp. 2549-2559, (2018)
[26]  
Gu S., Zuo W., Guo S., Chen Y., Chen C., Zhang L., Learning dynamic guidance for depth image enhancement, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 712-721, (2017)
[27]  
Li Y., Huang J., Ahuja N., Yang M., Joint image filtering with deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell, 41, 8, pp. 1909-1923, (2019)
[28]  
Jeon J., Lee S., Reconstruction-based pairwise depth dataset for depth image enhancement using CNN, Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 438-454, (2018)
[29]  
Yan S., Et al., DDRNet: Depth map denoising and refinement for consumer depth cameras using cascaded CNNs, Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 155-171, (2018)
[30]  
Lehtinen J., Et al., Noise2Noise: Learning image restoration without clean data, Proc. 35th Int. Conf. Mech. Learn. (ICML), pp. 2965-2974, (2018)