Sparse Temporal Causal Convolution for Efficient Action Modeling

被引:16
作者
Cheng, Changmao [1 ,2 ,3 ]
Zhang, Chi [4 ]
Wei, Yichen [4 ]
Jiang, Yu-Gang [1 ,2 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Joint Res Ctr Intelligent Video Technol, Shanghai, Peoples R China
[3] Jilian Technol Grp Video, Shanghai, Peoples R China
[4] Megvii Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年
基金
中国国家自然科学基金;
关键词
Action Recognition; Causal Modeling; Multi-Task Learning;
D O I
10.1145/3343031.3351054
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recently, spatio-temporal convolutional networks have achieved prominent performance in action classification. However, debates on the importance of temporal information lead to the rethinking of these architectures. In this work, we propose to employ sparse temporal convolutional operations in networks for efficient action modeling. We demonstrate that the explicit temporal feature interactions can be largely reduced without any degradation. And towards better scalability, we use causal convolutions for temporal feature learning. Under causality constraints, we replenish the model with auxiliary self-supervised tasks, namely video prediction and frame order discrimination. Besides, a gradient based multi-task learning algorithm is introduced for guaranteeing the dominance of action recognition task. The proposed model matches or outperforms the state-of-the-art methods on Kinetics, Something-Something V2, UCF101 and HMDB51 datasets.
引用
收藏
页码:592 / 600
页数:9
相关论文
共 71 条
[1]  
Agrawal P., 2015, CVPR
[2]  
[Anonymous], 2017, ICLR
[3]  
[Anonymous], 2017, ACM MM
[4]  
[Anonymous], 2017, ABS170805038 CORR
[5]  
[Anonymous], 2017, ICCV
[6]  
[Anonymous], 2016, ECCV
[7]  
[Anonymous], NIPS
[8]  
[Anonymous], ACM MM
[9]  
[Anonymous], 2016, NIPS
[10]  
[Anonymous], 2015, CVPR