Learning and Distillating the Internal Relationship of Motion Features in Action Recognition

被引:0
作者
Lu, Lu [1 ]
Li, Siyuan [1 ]
Chen, Niannian [1 ]
Gao, Lin [1 ]
Fan, Yong [1 ]
Jiang, Yong [1 ]
Wu, Ling [1 ]
机构
[1] Southwest Univ Sci & Technol, Mianyang, Sichuan, Peoples R China
来源
NEURAL INFORMATION PROCESSING, ICONIP 2020, PT IV | 2020年 / 1332卷
关键词
Action recognition; Knowledge distillation; Temporal modeling; 3D Convolution;
D O I
10.1007/978-3-030-63820-7_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of video-based action recognition, a majority of advanced approaches train a two-stream architecture in which an appearance stream for images and a motion stream for optical flow frames. Due to the considerable computation cost of optical flow and high inference latency of the two-stream method, knowledge distillation is introduced to efficiently capture two-stream representation while only inputting RGB images. Following this technique, this paper proposes a novel distillation learning strategy to sufficiently learn and mimic the representation of the motion stream. Besides, we propose a lightweight attention-based fusion module to uniformly exploit both appearance and motion information. Experiments illustrate that the proposed distillation strategy and fusion module achieve better performance over the baseline technique, and our proposal outperforms the known state-of-art approaches in terms of single-stream and traditional two-stream methods.
引用
收藏
页码:248 / 255
页数:8
相关论文
共 24 条
[1]  
Gatys LA, 2015, Arxiv, DOI [arXiv:1508.06576, DOI 10.48550/ARXIV.1508.06576]
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   MARS: Motion-Augmented RGB Stream for Action Recognition [J].
Crasto, Nieves ;
Weinzaepfel, Philippe ;
Alahari, Karteek ;
Schmid, Cordelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883
[4]   Deep Temporal Linear Encoding Networks [J].
Diba, Ali ;
Sharma, Vivek ;
Van Gool, Luc .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1541-1550
[5]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[6]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[7]   ActionVLAD: Learning spatio-temporal aggregation for action classification [J].
Girdhar, Rohit ;
Ramanan, Deva ;
Gupta, Abhinav ;
Sivic, Josef ;
Russell, Bryan .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3165-3174
[8]   The "something something" video database for learning and evaluating visual common sense [J].
Goyal, Raghav ;
Kahou, Samira Ebrahimi ;
Michalski, Vincent ;
Materzynska, Joanna ;
Westphal, Susanne ;
Kim, Heuna ;
Haenel, Valentin ;
Fruend, Ingo ;
Yianilos, Peter ;
Mueller-Freitag, Moritz ;
Hoppe, Florian ;
Thurau, Christian ;
Bax, Ingo ;
Memisevic, Roland .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851
[9]   Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [J].
Hara, Kensho ;
Kataoka, Hirokatsu ;
Satoh, Yutaka .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6546-6555
[10]   Perceptual Losses for Real-Time Style Transfer and Super-Resolution [J].
Johnson, Justin ;
Alahi, Alexandre ;
Li Fei-Fei .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :694-711