VLAD3: Encoding Dynamics of Deep Features for Action Recognition

被引:59
作者
Li, Yingwei [1 ]
Li, Weixin [1 ]
Mahadevan, Vijay
Vasconcelos, Nuno [1 ]
机构
[1] Univ Calif San Diego, San Diego, CA 92103 USA
来源
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年
关键词
VIDEO;
D O I
10.1109/CVPR.2016.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous approaches to action recognition with deep features tend to process video frames only within a small temporal region, and do not model long-range dynamic information explicitly. However, such information is important for the accurate recognition of actions, especially for the discrimination of complex activities that share sub-actions, and when dealing with untrimmed videos. Here, we propose a representation, VLAD for Deep Dynamics (VLAD(3)), that accounts for different levels of video dynamics. It captures short-term dynamics with deep convolutional neural network features, relying on linear dynamic systems ( LDS) to model medium-range dynamics. To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD3 representation. An extensive evaluation was performed on Olympic Sports, UCF101 and THUMOS15, where the use of the VLAD3 representation leads to state-of-the-art results.
引用
收藏
页码:1951 / 1960
页数:10
相关论文
共 35 条
[1]  
[Anonymous], CVPR WORKSH
[2]  
[Anonymous], 2014, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2012, CRCV T 12 01
[4]  
[Anonymous], ARXIV14054506
[5]  
[Anonymous], THUMOS CHALLENGE ACT
[6]  
[Anonymous], 1997, Neural Computation
[7]  
[Anonymous], ADV NEURAL INF PROCE
[8]  
[Anonymous], CVPR WORKSH
[9]  
[Anonymous], CVPR WORKSH
[10]  
[Anonymous], 2014, ARXIV