Second-order Temporal Pooling for Action Recognition

被引:21
作者
Cherian, Anoop [1 ]
Gould, Stephen [1 ]
机构
[1] Australian Natl Univ, Australian Ctr Robot Vis, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Action recognition; Deep Learning; Kernel descriptors; Second-order statistics; Pooling; Image Representations; End-to-end learning; Region covariance descriptors;
D O I
10.1007/s11263-018-1111-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics.Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.
引用
收藏
页码:340 / 362
页数:23
相关论文
共 95 条
[1]  
[Anonymous], 2017, CVPR
[2]  
[Anonymous], 2017, AAAI
[3]  
[Anonymous], 2016, ECCV
[4]  
[Anonymous], 2013, ICCV
[5]  
[Anonymous], 2013, INT J COMPUT VISION
[6]  
[Anonymous], 2016, NIPS
[7]  
[Anonymous], 2012, ECCV
[8]  
[Anonymous], 2015, CVPR
[9]  
[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59
[10]  
[Anonymous], 2016, CVIU