Image-based action recognition using hint-enhanced deep neural networks

被引:44
作者
Qi, Tangquan [1 ]
Xu, Yong [1 ]
Quan, Yuhui [1 ]
Wang, Yaodong [1 ]
Ling, Haibin [1 ,2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Guangdong, Peoples R China
[2] Temple Univ, Comp & Informat Sci Dept, Ctr Informat Sci & Technol, Philadelphia, PA 19122 USA
基金
中国国家自然科学基金;
关键词
Action recognition; Pose hints; Convolutional neural networks;
D O I
10.1016/j.neucom.2017.06.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While human action recognition from still images finds wide applications in computer vision, it remains a very challenging problem. Compared with video-based ones, image-based action representation and recognition are impossible to access the motion cues of action, which largely increases the difficulties in dealing with pose variances and cluttered backgrounds. Motivated by the recent success of convolutional neural networks (CNN) in learning discriminative features from objects in the presence of variations and backgrounds, in this paper, we investigate the potentials of CNN in image-based action recognition. A new action recognition method is proposed by implicitly integrating pose hints into the CNN framework, i.e., we use a CNN originally learned for object recognition as a base network and then transfer it to action recognition by training the base network jointly with inference of poses. Such a joint training scheme can guide the network towards pose inference and meanwhile prevent the unrelated knowledge inherited from the base network. For further performance improvement, the training data is augmented by enriching the pose-related samples. The experimental results on three benchmark datasets have demonstrated the effectiveness of our method. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:475 / 488
页数:14
相关论文
共 35 条
[1]  
[Anonymous], P IEEE C COMP VIS PA
[2]  
[Anonymous], ADV NEURAL INFORM PR
[3]   Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories [J].
Ben Amor, Boulbaba ;
Su, Jingyong ;
Srivastava, Anuj .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (01) :1-13
[4]  
Bourdev L., 2010, DETECTING PEOPLE USI
[5]   Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations [J].
Bourdev, Lubomir ;
Malik, Jitendra .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :1365-1372
[6]  
Collobert R., 2008, P 25 ICML, P160, DOI [10.1145/1390156.1390177, DOI 10.1145/1390156.1390177]
[7]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[8]  
Girshick, 2015, P IEEE INT C COMP VI, DOI [10.1109/ICCV.2015.169, DOI 10.1109/ICCV.2015.169]
[9]  
Girshick R., 2014, IEEE C COMP VIS PATT, DOI [DOI 10.1109/CVPR.2014.81, 10.1109/CVPR.2014.81]
[10]   Contextual Action Recognition with R*CNN [J].
Gkioxari, Georgia ;
Girshick, Ross ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1080-1088