LEARNING RECURRENT STRUCTURE-GUIDED ATTENTION NETWORK FOR MULTI-PERSON POSE ESTIMATION

被引:8
作者
Qiu, Zhongwei [1 ,2 ]
Qiu, Kai [2 ]
Fu, Jianlong [2 ]
Fu, Dongmei [1 ]
机构
[1] Univ Sci & Technol Beijing, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2019年
关键词
Pose estimation; Attention model;
D O I
10.1109/ICME.2019.00079
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Multi-person pose estimation aims to localize tens of human joints (e.g., elbow, wrist, etc.) from multiple human bodies in an image. Existing approaches mainly adopt a twostage pipeline, which usually consists of a human detector (i.e., generating a bounding box for each person) and a single person pose estimator (i.e., generating human joints from each bounding box). However, these approaches neglect the challenges of large pose variations and heavy occlusions in each bounding box, which often results in imprecise human joint localization. In this paper, we propose a structure-guided attention network (SGAN) for multi-person pose estimation. Specifically, a structured pose representation is encoded by learning a joint confidence map and a joint association map, which can be further refined by a structure-guided attention network (SGAN) in a recurrent way. Note that SGAN enables a deep neural network to take initial pose estimation as references, and to discover multi-scale pose features as completion, and thus the learning of pose structures can be reinforced. Extensive experiments show the best single-model results against the state-of-the-art approaches, with a relative 3.5% mAP gain in the challenging COCO Keypoint dataset.
引用
收藏
页码:418 / 423
页数:6
相关论文
共 19 条
[11]  
Newell A, 2017, ADV NEUR IN, V30
[12]   Stacked Hourglass Networks for Human Pose Estimation [J].
Newell, Alejandro ;
Yang, Kaiyu ;
Deng, Jia .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :483-499
[13]   Towards Accurate Multi-person Pose Estimation in the Wild [J].
Papandreou, George ;
Zhu, Tyler ;
Kanazawa, Nori ;
Toshev, Alexander ;
Tompson, Jonathan ;
Bregler, Chris ;
Murphy, Kevin .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3711-3719
[14]   Flowing ConvNets for Human Pose Estimation in Videos [J].
Pfister, Tomas ;
Charles, James ;
Zisserman, Andrew .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1913-1921
[15]   DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation [J].
Pishchulin, Leonid ;
Insafutdinov, Eldar ;
Tang, Siyu ;
Andres, Bjoern ;
Andriluka, Mykhaylo ;
Gehler, Peter ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4929-4937
[16]   ImageNet Large Scale Visual Recognition Challenge [J].
Russakovsky, Olga ;
Deng, Jia ;
Su, Hao ;
Krause, Jonathan ;
Satheesh, Sanjeev ;
Ma, Sean ;
Huang, Zhiheng ;
Karpathy, Andrej ;
Khosla, Aditya ;
Bernstein, Michael ;
Berg, Alexander C. ;
Fei-Fei, Li .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) :211-252
[17]  
Tompson J, 2014, JOINT TRAINING CONVO
[18]   Deep Parametric Continuous Convolutional Neural Networks [J].
Wang, Shenlong ;
Suo, Simon ;
Ma, Wei-Chiu ;
Pokrovsky, Andrei ;
Urtasun, Raquel .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2589-2597
[19]   Simple Baselines for Human Pose Estimation and Tracking [J].
Xiao, Bin ;
Wu, Haiping ;
Wei, Yichen .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :472-487