Human-pose estimation based on weak supervision

被引：8

作者：

Hu X. ^{[1
]}

Bao X. ^{[1
]}

Wei G. ^{[1
]}

Li Z. ^{[1
]}

机构：

[1] School of Artificial Intelligence, Beijing Normal University, Beijing

来源：

Virtual Reality and Intelligent Hardware | 2023年 / 5卷 / 04期

关键词：

Clothing estimation; Human pose estimation; Weak supervision;

D O I：

10.1016/j.vrih.2022.08.010

中图分类号：

学科分类号：

摘要：

Background: In computer vision, simultaneously estimating human pose, shape, and clothing is a practical issue in real life, but remains a challenging task owing to the variety of clothing, complexity of deformation, shortage of large-scale datasets, and difficulty in estimating clothing style. Methods: We propose a multistage weakly supervised method that makes full use of data with less labeled information for learning to estimate human body shape, pose, and clothing deformation. In the first stage, the SMPL human-body model parameters were regressed using the multi-view 2D key points of the human body. Using multi-view information as weakly supervised information can avoid the deep ambiguity problem of a single view, obtain a more accurate human posture, and access supervisory information easily. In the second stage, clothing is represented by a PCAbased model that uses two-dimensional key points of clothing as supervised information to regress the parameters. In the third stage, we predefine an embedding graph for each type of clothing to describe the deformation. Then, the mask information of the clothing is used to further adjust the deformation of the clothing. To facilitate training, we constructed a multi-view synthetic dataset that included BCNet and SURREAL. Results: The Experiments show that the accuracy of our method reaches the same level as that of SOTA methods using strong supervision information while only using weakly supervised information. Because this study uses only weakly supervised information, which is much easier to obtain, it has the advantage of utilizing existing data as training data. Experiments on the DeepFashion2 dataset show that our method can make full use of the existing weak supervision information for fine-tuning on a dataset with little supervision information, compared with the strong supervision information that cannot be trained or adjusted owing to the lack of exact annotation information. Conclusions: Our weak supervision method can accurately estimate human body size, pose, and several common types of clothing and overcome the issues of the current shortage of clothing data. © 2022 Beijing Zhongke Journal Publishing Co. Ltd

引用

页码：366 / 377

页数：11

共 30 条

[1]

Newcombe R.A., Fox D., Seitz S.M., DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 343-352, (2015)

[2]

Guo K., Xu F., Yu T., Liu X., Dai Q., Liu Y., Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera, ACM Transactions on Graphics, 36, 3, pp. 1-13, (2017)

[3]

Innmann M., Zollhofer M., Niessner M., Theobalt C., Stamminger M., VolumeDeform: real-time volumetric non-rigid reconstruction, (2016)

[4]

Yu T., Guo K.W., Xu F., Dong Y., Su Z.Q., Zhao J.H., Li J.G., Dai Q.H., Liu Y.B., BodyFusion: real-time capture of human motion and surface geometry using a single depth camera, 2017 IEEE International Conference on Computer Vision (ICCV), pp. 910-919, (2017)

[5]

Yu T., Zheng Z.R., Guo K.W., Zhao J.H., Dai Q.H., Li H., Pons-Moll G., Liu Y.B., DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7287-7296, (2018)

[6]

Loper M., Mahmood N., Romero J., Pons-Moll G., Black M.J., SMPL, ACM Transactions on Graphics, 34, 6, pp. 1-16, (2015)

[7]

Karambakhsh A., Kamel A., Sheng B., Li P., Yang P., Feng D.D., Deep gesture interaction for augmented anatomy learning, International Journal of Information Management, 45, 100, pp. 328-336, (2019)

[8]

Vlasic D., Baran I., Matusik W., Popovic J., Articulated mesh animation from multi-view silhouettes, ACM Transactions on Graphics, 27, 3, pp. 1-9, (2008)

[9]

Gall J., Stoll C., de Aguiar E., Theobalt C., Rosenhahn B., Seidel H.P., Motion capture using joint skeleton tracking and surface estimation, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746-1753, (2009)

[10]

Liu Y.B., Stoll C., Gall J., Seidel H.P., Theobalt C., Markerless motion capture of interacting characters using multi-view image segmentation, CVPR. Colorado Springs, pp. 1249-1256, (2011)

← 1 2 3 →