LEARNING RECURRENT STRUCTURE-GUIDED ATTENTION NETWORK FOR MULTI-PERSON POSE ESTIMATION

被引:8
作者
Qiu, Zhongwei [1 ,2 ]
Qiu, Kai [2 ]
Fu, Jianlong [2 ]
Fu, Dongmei [1 ]
机构
[1] Univ Sci & Technol Beijing, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2019年
关键词
Pose estimation; Attention model;
D O I
10.1109/ICME.2019.00079
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Multi-person pose estimation aims to localize tens of human joints (e.g., elbow, wrist, etc.) from multiple human bodies in an image. Existing approaches mainly adopt a twostage pipeline, which usually consists of a human detector (i.e., generating a bounding box for each person) and a single person pose estimator (i.e., generating human joints from each bounding box). However, these approaches neglect the challenges of large pose variations and heavy occlusions in each bounding box, which often results in imprecise human joint localization. In this paper, we propose a structure-guided attention network (SGAN) for multi-person pose estimation. Specifically, a structured pose representation is encoded by learning a joint confidence map and a joint association map, which can be further refined by a structure-guided attention network (SGAN) in a recurrent way. Note that SGAN enables a deep neural network to take initial pose estimation as references, and to discover multi-scale pose features as completion, and thus the learning of pose structures can be reinforced. Extensive experiments show the best single-model results against the state-of-the-art approaches, with a relative 3.5% mAP gain in the challenging COCO Keypoint dataset.
引用
收藏
页码:418 / 423
页数:6
相关论文
共 19 条
[1]  
[Anonymous], 2013, CVPR, DOI DOI 10.1109/CVPR.2013.471
[2]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[3]  
Chen Xianjie, 2014, Advances in Neural Information Processing Systems
[4]   Cascaded Pyramid Network for Multi-Person Pose Estimation [J].
Chen, Yilun ;
Wang, Zhicheng ;
Peng, Yuxiang ;
Zhang, Zhiqiang ;
Yu, Gang ;
Sun, Jian .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112
[5]   Human Pose Estimation using Body Parts Dependent Joint Regressors [J].
Dantone, Matthias ;
Gall, Juergen ;
Leistner, Christian ;
Van Gool, Luc .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3041-3048
[6]   RMPE: Regional Multi-Person Pose Estimation [J].
Fang, Hao-Shu ;
Xie, Shuqin ;
Tai, Yu-Wing ;
Lu, Cewu .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2353-2362
[7]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[8]   HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection [J].
Kong, Tao ;
Yao, Anbang ;
Chen, Yurong ;
Sun, Fuchun .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :845-853
[9]   Human Pose Estimation Using Deep Consensus Voting [J].
Lifshitz, Ita ;
Fetaya, Ethan ;
Ullman, Shimon .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :246-260
[10]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755