Learning Visual Tempo for Action Recognition

被引:0
作者
Nie, Mu [1 ]
Yang, Sen [2 ]
Yang, Wankou [2 ]
机构
[1] Southeast Univ, Sch Cyber Sci & Engn, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
来源
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I | 2022年 / 1700卷
关键词
Action recognition; Spatiotemporal; Multi-receptive field; Visual tempo; NETWORK;
D O I
10.1007/978-981-19-7946-0_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The variation of visual tempo, which is an essential feature in action recognition, characterizes the spatiotemporal scale of the action and the dynamics. Existing models usually use spatiotemporal convolution to understand spatiotemporal scenarios. However, they cannot cope with the difference in the visual tempo changes, due to the limited view of temporal and spatial dimensions. To address these issues, we propose a multi-receptive field spatiotemporal (MRF-ST) network in this paper, to effectively model the spatial and temporal information. We utilize dilated convolutions to obtain different receptive fields and design dynamic weighting with different dilation rates based on the attention mechanism. In the proposed network, the MRF-ST network can directly obtain various tempos in the same network layer without any additional learning cost. Moreover, the network can improve the accuracy of action recognition by learning more visual tempo of different actions. Extensive evaluations show that MRF-ST reaches the state-of-the-art on the UCF-101 and HMDB-51 datasets. Further analysis also indicates that MRF-ST can significantly improve the performance at the scenes with large variances in visual tempo.
引用
收藏
页码:139 / 155
页数:17
相关论文
共 48 条
  • [11] Scene Segmentation With Dual Relation-Aware Attention Network
    Fu, Jun
    Liu, Jing
    Jiang, Jie
    Li, Yong
    Bao, Yongjun
    Lu, Hanqing
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2547 - 2560
  • [12] RSCNN: A CNN-Based Method to Enhance Low-Light Remote-Sensing Images
    Hu, Linshu
    Qin, Mengjiao
    Zhang, Feng
    Du, Zhenhong
    Liu, Renyi
    [J]. REMOTE SENSING, 2021, 13 (01) : 1 - 13
  • [13] STM: SpatioTemporal and Motion Encoding for Action Recognition
    Jiang, Boyuan
    Wang, MengMeng
    Gan, Weihao
    Wu, Wei
    Yan, Junjie
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2000 - 2009
  • [14] A discriminative deep association learning for facial expression recognition
    Jin, Xing
    Sun, Wenyun
    Jin, Zhong
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (04) : 779 - 793
  • [15] Discrete-Time Predictive Sliding Mode Control for a Constrained Parallel Micropositioning Piezostage
    Kang, Shengzheng
    Wu, Hongtao
    Yang, Xiaolong
    Li, Yao
    Yao, Jiafeng
    Chen, Bai
    Lu, Huimin
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (05): : 3025 - 3036
  • [16] Kay W, 2017, Arxiv, DOI arXiv:1705.06950
  • [17] Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
  • [18] Deep Manifold Structure Transfer for Action Recognition
    Li, Ce
    Zhang, Baochang
    Chen, Chen
    Ye, Qixiang
    Han, Jungong
    Guo, Guodong
    Ji, Rongrong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) : 4646 - 4658
  • [19] Collaborative Spatiotemporal Feature Learning for Video Action Recognition
    Li, Chao
    Zhong, Qiaoyong
    Xie, Di
    Pu, Shiliang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7864 - 7873
  • [20] Spatio-temporal deformable 3D ConvNets with attention for action recognition
    Li, Jun
    Liu, Xianglong
    Zhang, Mingyuan
    Wang, Deqing
    [J]. PATTERN RECOGNITION, 2020, 98 (98)