Learning Visual Tempo for Action Recognition

被引:0
作者
Nie, Mu [1 ]
Yang, Sen [2 ]
Yang, Wankou [2 ]
机构
[1] Southeast Univ, Sch Cyber Sci & Engn, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
来源
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I | 2022年 / 1700卷
关键词
Action recognition; Spatiotemporal; Multi-receptive field; Visual tempo; NETWORK;
D O I
10.1007/978-981-19-7946-0_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The variation of visual tempo, which is an essential feature in action recognition, characterizes the spatiotemporal scale of the action and the dynamics. Existing models usually use spatiotemporal convolution to understand spatiotemporal scenarios. However, they cannot cope with the difference in the visual tempo changes, due to the limited view of temporal and spatial dimensions. To address these issues, we propose a multi-receptive field spatiotemporal (MRF-ST) network in this paper, to effectively model the spatial and temporal information. We utilize dilated convolutions to obtain different receptive fields and design dynamic weighting with different dilation rates based on the attention mechanism. In the proposed network, the MRF-ST network can directly obtain various tempos in the same network layer without any additional learning cost. Moreover, the network can improve the accuracy of action recognition by learning more visual tempo of different actions. Extensive evaluations show that MRF-ST reaches the state-of-the-art on the UCF-101 and HMDB-51 datasets. Further analysis also indicates that MRF-ST can significantly improve the performance at the scenes with large variances in visual tempo.
引用
收藏
页码:139 / 155
页数:17
相关论文
共 48 条
  • [1] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [2] Learning principal orientations and residual descriptor for action recognition
    Chen, Lei
    Song, Zhanjie
    Lu, Jiwen
    Zhou, Jie
    [J]. PATTERN RECOGNITION, 2019, 86 (14-26) : 14 - 26
  • [3] Spatio-temporal Channel Correlation Networks for Action Classification
    Diba, Ali
    Fayyaz, Mohsen
    Sharma, Vivek
    Arzani, M. Mahdi
    Yousefzadeh, Rahman
    Gall, Juergen
    Van Gool, Luc
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 299 - 315
  • [4] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [5] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
  • [6] Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification
    Du, Yang
    Yuan, Chunfeng
    Li, Bing
    Zhao, Lili
    Li, Yangxi
    Hu, Weiming
    [J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 388 - 404
  • [7] Feichtenhofer C, 2016, ADV NEUR IN, V29
  • [8] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [9] Spatiotemporal Multiplier Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7445 - 7454
  • [10] Convolutional Two-Stream Network Fusion for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941