TMF: Temporal Motion and Fusion for action recognition

被引：6

作者：

Wang, Yanze ^{[1
]}

Ye, Junyong ^{[1
]}

机构：

[1] Chongqing Univ, Key Lab Optoelect Technol & Syst Minist Educ, Chongqing, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2021年 / 213卷

基金：

国家重点研发计划;

关键词：

Action recognition; Motion extraction; Temporal crossing fusion;

D O I：

10.1016/j.cviu.2021.103304

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal motion information plays an important role in video understanding, human action recognition and other fields. Optical flow, which contains rich temporal motion information, has been widely used in many visual tasks and has achieved superior performance. However, the extraction of optical flow is time-consuming and laborious. In this paper, we propose a Temporal Motion and Fusion (TMF) module, including a motion extraction (ME) module and a temporal crossing fusion (TCF) module. The ME module can replace the traditional optical flow, establish the matching relationship between adjacent frames on the convoluted feature maps. And then extract simple and effective short-term motion information. TCF module crosses adjacent frames and fuse the information of nonadjacent video frames to realize long-term motion information modeling. Finally, the extracted motion information is fused with the appearance information captured by 2D convolution for final recognition. The experiment proved that with only a few additional parameters and calculation costs increased, our proposed lightweight model achieves state-of-the-art results on Something-Something-V1&V2 and Diving-48, and obtains competitive results on HMDB-51 and UCF-101 among the single models.

引用

页数：10

共 39 条

[1]

[Anonymous], 2012, UCF101 DATASET 101 H

[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[3]

Crasto N., 2019, P IEEECVF C COMPUTER

[4]

Davis Y.H.N.L.S, 2018, P IEEE WINTER C APPL

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6] FlowNet: Learning Optical Flow with Convolutional Networks [J].

Dosovitskiy, Alexey ;

Fischer, Philipp ;

Ilg, Eddy ;

Haeusser, Philip ;

Hazirbas, Caner ;

Golkov, Vladimir ;

van der Smagt, Patrick ;

Cremers, Daniel ;

Brox, Thomas .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766

[7] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[8] Convolutional Two-Stream Network Fusion for Video Action Recognition [J].

Feichtenhofer, Christoph ;

Pinz, Axel ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941

[9] Res2Net: A New Multi-Scale Backbone Architecture [J].

Gao, Shang-Hua ;

Cheng, Ming-Ming ;

Zhao, Kai ;

Zhang, Xin-Yu ;

Yang, Ming-Hsuan ;

Torr, Philip .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662

[10] The "something something" video database for learning and evaluating visual common sense [J].

Goyal, Raghav ;

Kahou, Samira Ebrahimi ;

Michalski, Vincent ;

Materzynska, Joanna ;

Westphal, Susanne ;

Kim, Heuna ;

Haenel, Valentin ;

Fruend, Ingo ;

Yianilos, Peter ;

Mueller-Freitag, Moritz ;

Hoppe, Florian ;

Thurau, Christian ;

Bax, Ingo ;

Memisevic, Roland .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5843-5851

← 1 2 3 4 →