YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation

被引:5
作者
Jiang, Yi [1 ]
Yang, Kexin [1 ]
Zhu, Jinlin [1 ]
Qin, Li [2 ]
机构
[1] Harbin Univ Sci & Technol, Dept Commun Engn, Harbin 150080, Peoples R China
[2] Harbin Univ Sci & Technol, Dept Engn Mech, Harbin 150080, Peoples R China
关键词
human pose estimation; deep learning; convolutional neural network; transformer;
D O I
10.3390/electronics13030563
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, there has been significant progress in human pose estimation, fueled by the widespread adoption of deep convolutional neural networks. However, despite these advancements, multi-person 2D pose estimation still remains highly challenging due to factors such as occlusion, noise, and non-rigid body movements. Currently, most multi-person pose estimation approaches handle joint localization and association separately. This study proposes a direct regression-based method to estimate the 2D human pose from a single image. The authors name this network YOLO-Rlepose. Compared to traditional methods, YOLO-Rlepose leverages Transformer models to better capture global dependencies between image feature blocks and preserves sufficient spatial information for keypoint detection through a multi-head self-attention mechanism. To further improve the accuracy of the YOLO-Rlepose model, this paper proposes the following enhancements. Firstly, this study introduces the C3 Module with Swin Transformer (C3STR). This module builds upon the C3 module in You Only Look Once (YOLO) by incorporating a Swin Transformer branch, enhancing the YOLO-Rlepose model's ability to capture global information and rich contextual information. Next, a novel loss function named Rle-Oks loss is proposed. The loss function facilitates the training process by learning the distributional changes through Residual Log-likelihood Estimation. To assign different weights based on the importance of different keypoints in the human body, this study introduces a weight coefficient into the loss function. The experiments proved the efficiency of the proposed YOLO-Rlepose model. On the COCO dataset, the model outperforms the previous SOTA method by 2.11% in AP.
引用
收藏
页数:16
相关论文
共 40 条
  • [1] Ailing Zeng, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12359), P507, DOI 10.1007/978-3-030-58568-6_30
  • [2] PoseTrack: A Benchmark for Human Pose Estimation and Tracking
    Andriluka, Mykhaylo
    Iqbal, Umar
    Insafutdinov, Eldar
    Pishchulin, Leonid
    Milan, Anton
    Gall, Juergen
    Schiele, Bernt
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5167 - 5176
  • [3] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934]
  • [4] Cai Y., 2020, COMPUTER VISION ECCV, P455, DOI 10.1007/978-3-030-58580
  • [5] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    Cao, Zhe
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
  • [6] Human Pose Estimation with Iterative Error Feedback
    Carreira, Joao
    Agrawal, Pulkit
    Fragkiadaki, Katerina
    Malik, Jitendra
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4733 - 4742
  • [7] Cascaded Pyramid Network for Multi-Person Pose Estimation
    Chen, Yilun
    Wang, Zhicheng
    Peng, Yuxiang
    Zhang, Zhiqiang
    Yu, Gang
    Sun, Jian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7103 - 7112
  • [8] HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
    Cheng, Bowen
    Xiao, Bin
    Wang, Jingdong
    Shi, Honghui
    Huang, Thomas S.
    Zhang, Lei
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5385 - 5394
  • [9] Choi H., 2020, P EUR C COMP VIS, P769
  • [10] Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714