Adaptive Positive Sample Selection and Dynamic Soft Label Assignment for Keypoint Detection

被引：0

作者：

Tang, Wenxiao ^{[1
]}

Chen, Shiqi ^{[2
]}

Wang, Minghui ^{[1
]}

Saad Shakeel, M. ^{[1
]}

Jin, Jian ^{[3
]}

Kang, Wenxiong ^{[1
,4
,5
]}

Lin, Weisi ^{[3
]}

机构：

[1] South China Univ Technol SCUT, Sch Comp Sci & Engn, Guangzhou 510640, Peoples R China

[2] TP Link Technol Co LTD, Shenzhen 518132, Peoples R China

[3] Nanyang Technol Univ NTU, Coll Comp & Data Sci, Singapore 639798, Singapore

[4] Pazhou Lab, Guangzhou 510335, Peoples R China

[5] Guangdong Enterprise Key Lab Intelligent Finance, Guangzhou 510705, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Heating systems; Pose estimation; Classification algorithms; Circuits and systems; Training; Pedestrians; Object detection; Human pose estimation; adaptive positive sample selection; dynamic soft label assignment; vector-level post-processing;

D O I：

10.1109/TCSVT.2024.3434563

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Pose estimation plays a crucial role in human-centered vision applications. Some recent efforts achieved pose estimation by keypoints detection. Drawing inspiration from object detection, they treated keypoints as objects and achieved unbiased estimation through implementation of classification and regression heads. However, they still failed to achieve satisfactory performance for detecting heavily occluded keypoints and required elaborate and unavoidable post-processing steps. With a thorough exploration of keypoints' characteristics, we have developed a novel Adaptive positive Sample selection and dynamic soft Label Assignment (ASLA) scheme tailored for keypoint detection. Specifically, we select positive samples for each keypoint according to the summation distance from the sample coordinates and their predicted coordinates to their corresponding ground truth (GT) in the training phase. For occluded keypoints, the positive samples defined by our method may fall in the semantically relevant regions of pedestrians, rather than the spatially adjacent regions of obstructions, significantly improving their localization performance. Meanwhile, we dynamically assign classification labels to these positive samples based on the distance between their predicted coordinates and their corresponding GT, which ensures that high quality positive samples are assigned with high classification labels. Benefiting from the practical design of our ASLA, the post-processing step is not essential; however, the simple vector-level post-processing would be the icing on the cake. Finally, we extensively evaluate our ASLA performance on two popular human pose estimation benchmarks, COCO and MPII, and comprehensive experiments show that our ASLA significantly outperforms state-of-the-art algorithms. Our code and models will be available at https://github.com/SCUT-BIP-Lab/ASLA.

引用

页码：12665 / 12675

页数：11

共 51 条

[11] Papandreou G., Et al., Towards accurate multi-person pose estimation in the wild, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4903-4911, (2017)
[12] Tian Z., Chen H., Shen C., DirectPose: Direct End-to-end Multiperson Pose Estimation, (2019)
[13] Tian Z., Shen C., Chen H., He T., FCOS: Fully convolutional one-stage object detection, Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 9627-9636, (2019)
[14] Zhu B., Et al., AutoAssign: Differentiable Label Assignment for Dense Object Detection, (2020)
[15] Lin T.-Y., Et al., Microsoft COCO: Common objects in context, Proc. 13th Eur. Conf. Comput. Vis. (ECCV)., pp. 740-755, (2014)
[16] Andriluka M., Pishchulin L., Gehler P., Schiele B., 2D human pose estimation: New benchmark and state of the art analysis, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3686-3693, (2014)
[17] Toshev A., Szegedy C., DeepPose: Human pose estimation via deep neural networks, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1653-1660, (2014)
[18] Tompson J.J., Jain A., Lecun Y., Bregler C., Joint training of a convolutional network and a graphical model for human pose estimation, Proc. Adv. Neural Inf. Process. Syst., 27, pp. 1-9, (2014)
[19] McNally W., Vats K., Wong A., McPhee J., Rethinking keypoint representations: Modeling keypoints and poses as objects for multiperson human pose estimation, Proc. Eur. Conf. Comput. Vis., pp. 37-54, (2022)
[20] Geng Z., Sun K., Xiao B., Zhang Z., Wang J., Bottomup human pose estimation via disentangled keypoint regression, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 14676-14686, (2021)

← 1 2 3 4 5 6 →