Learning Visual Tempo for Action Recognition

被引：0

作者：

Nie, Mu ^{[1
]}

Yang, Sen ^{[2
]}

Yang, Wankou ^{[2
]}

机构：

[1] Southeast Univ, Sch Cyber Sci & Engn, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China

来源：

ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I | 2022年 / 1700卷

关键词：

Action recognition; Spatiotemporal; Multi-receptive field; Visual tempo; NETWORK;

D O I：

10.1007/978-981-19-7946-0_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The variation of visual tempo, which is an essential feature in action recognition, characterizes the spatiotemporal scale of the action and the dynamics. Existing models usually use spatiotemporal convolution to understand spatiotemporal scenarios. However, they cannot cope with the difference in the visual tempo changes, due to the limited view of temporal and spatial dimensions. To address these issues, we propose a multi-receptive field spatiotemporal (MRF-ST) network in this paper, to effectively model the spatial and temporal information. We utilize dilated convolutions to obtain different receptive fields and design dynamic weighting with different dilation rates based on the attention mechanism. In the proposed network, the MRF-ST network can directly obtain various tempos in the same network layer without any additional learning cost. Moreover, the network can improve the accuracy of action recognition by learning more visual tempo of different actions. Extensive evaluations show that MRF-ST reaches the state-of-the-art on the UCF-101 and HMDB-51 datasets. Further analysis also indicates that MRF-ST can significantly improve the performance at the scenes with large variances in visual tempo.

引用

页码：139 / 155

页数：17

共 48 条

[1] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[2] Learning principal orientations and residual descriptor for action recognition
Chen, Lei
Song, Zhanjie
Lu, Jiwen
Zhou, Jie
[J]. PATTERN RECOGNITION, 2019, 86 (14-26) : 14 - 26
[3] Spatio-temporal Channel Correlation Networks for Action Classification
Diba, Ali
Fayyaz, Mohsen
Sharma, Vivek
Arzani, M. Mahdi
Yousefzadeh, Rahman
Gall, Juergen
Van Gool, Luc
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 299 - 315
[4] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[5] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
Du, Wenbin
Wang, Yali
Qiao, Yu
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
[6] Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification
Du, Yang
Yuan, Chunfeng
Li, Bing
Zhao, Lili
Li, Yangxi
Hu, Weiming
[J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 388 - 404
[7] Feichtenhofer C, 2016, ADV NEUR IN, V29
[8] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[9] Spatiotemporal Multiplier Networks for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Wildes, Richard P.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7445 - 7454
[10] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941

← 1 2 3 4 5 →