Learning Semantic-Aligned Action Representation

被引:12
作者
Ni, Bingbing [1 ]
Li, Teng [2 ]
Yang, Xiaokang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China
[2] Anhui Univ, Coll Elect Engn & Automat, Hefei 230061, Anhui, Peoples R China
关键词
Deep convolutional neural network (DCNN); end-to-end model; semantic alignment; sparse; ACTION RECOGNITION;
D O I
10.1109/TNNLS.2017.2731775
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental bottleneck for achieving highly discriminative action representation is that local motion/appearance features are usually not semantic aligned. Namely, a local feature, such as a motion vector or motion trajectory, does not possess any attribute that indicates which moving body part or operated object it is associated with. This mostly leads to global feature pooling/representation learning methods that are often too coarse. Inspired by the recent success of end-to-end (pixel-to-pixel) deep convolutional neural networks (DCNNs), in this paper, we first propose a DCNN architecture, which maps a human centric image region onto human body part response maps. Based on these response maps, we propose a second DCNN, which achieves semantic-aligned feature representation learning. Prior knowledge that only a few parts are responsible for a certain action is also utilized by introducing a group (part) sparseness prior during feature learning. The learned semantic-aligned feature not only boosts the discriminative capability of action representation, but also possesses the good nature of robustness to pose variations and occlusions. Finally, an iterative mining method is employed for learning discriminative action primitive detectors. Extensive experiments on action recognition benchmarks demonstrate a superior recognition performance of the proposed framework.
引用
收藏
页码:3715 / 3725
页数:11
相关论文
共 51 条
[1]  
[Anonymous], 2012, ABS12120402 CORR
[2]  
[Anonymous], 2014, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2014, Computer Science
[4]  
[Anonymous], 2014, ABS14126885 CORR
[5]  
[Anonymous], ICME
[6]  
[Anonymous], 2015, PROC CVPR IEEE
[7]  
[Anonymous], 2008, P BMVC 2008 19 BRIT
[8]  
[Anonymous], ABS151102126 CORR
[9]   Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations [J].
Bourdev, Lubomir ;
Malik, Jitendra .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :1365-1372
[10]   Semantic Pooling for Complex Event Analysis in Untrimmed Videos [J].
Chang, Xiaojun ;
Yu, Yao-Liang ;
Yang, Yi ;
Xing, Eric P. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (08) :1617-1632