Learning Visual Tempo for Action Recognition

被引:0
作者
Nie, Mu [1 ]
Yang, Sen [2 ]
Yang, Wankou [2 ]
机构
[1] Southeast Univ, Sch Cyber Sci & Engn, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
来源
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I | 2022年 / 1700卷
关键词
Action recognition; Spatiotemporal; Multi-receptive field; Visual tempo; NETWORK;
D O I
10.1007/978-981-19-7946-0_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The variation of visual tempo, which is an essential feature in action recognition, characterizes the spatiotemporal scale of the action and the dynamics. Existing models usually use spatiotemporal convolution to understand spatiotemporal scenarios. However, they cannot cope with the difference in the visual tempo changes, due to the limited view of temporal and spatial dimensions. To address these issues, we propose a multi-receptive field spatiotemporal (MRF-ST) network in this paper, to effectively model the spatial and temporal information. We utilize dilated convolutions to obtain different receptive fields and design dynamic weighting with different dilation rates based on the attention mechanism. In the proposed network, the MRF-ST network can directly obtain various tempos in the same network layer without any additional learning cost. Moreover, the network can improve the accuracy of action recognition by learning more visual tempo of different actions. Extensive evaluations show that MRF-ST reaches the state-of-the-art on the UCF-101 and HMDB-51 datasets. Further analysis also indicates that MRF-ST can significantly improve the performance at the scenes with large variances in visual tempo.
引用
收藏
页码:139 / 155
页数:17
相关论文
共 48 条
[11]   Scene Segmentation With Dual Relation-Aware Attention Network [J].
Fu, Jun ;
Liu, Jing ;
Jiang, Jie ;
Li, Yong ;
Bao, Yongjun ;
Lu, Hanqing .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) :2547-2560
[12]   RSCNN: A CNN-Based Method to Enhance Low-Light Remote-Sensing Images [J].
Hu, Linshu ;
Qin, Mengjiao ;
Zhang, Feng ;
Du, Zhenhong ;
Liu, Renyi .
REMOTE SENSING, 2021, 13 (01) :1-13
[13]   STM: SpatioTemporal and Motion Encoding for Action Recognition [J].
Jiang, Boyuan ;
Wang, MengMeng ;
Gan, Weihao ;
Wu, Wei ;
Yan, Junjie .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2000-2009
[14]   A discriminative deep association learning for facial expression recognition [J].
Jin, Xing ;
Sun, Wenyun ;
Jin, Zhong .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (04) :779-793
[15]   Discrete-Time Predictive Sliding Mode Control for a Constrained Parallel Micropositioning Piezostage [J].
Kang, Shengzheng ;
Wu, Hongtao ;
Yang, Xiaolong ;
Li, Yao ;
Yao, Jiafeng ;
Chen, Bai ;
Lu, Huimin .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (05) :3025-3036
[16]  
Kay W, 2017, Arxiv, DOI [arXiv:1705.06950, DOI 10.48550/ARXIV.1705.06950, 10.48550/arXiv.1705.06950]
[17]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[18]   Deep Manifold Structure Transfer for Action Recognition [J].
Li, Ce ;
Zhang, Baochang ;
Chen, Chen ;
Ye, Qixiang ;
Han, Jungong ;
Guo, Guodong ;
Ji, Rongrong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) :4646-4658
[19]   Collaborative Spatiotemporal Feature Learning for Video Action Recognition [J].
Li, Chao ;
Zhong, Qiaoyong ;
Xie, Di ;
Pu, Shiliang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7864-7873
[20]   Spatio-temporal deformable 3D ConvNets with attention for action recognition [J].
Li, Jun ;
Liu, Xianglong ;
Zhang, Mingyuan ;
Wang, Deqing .
PATTERN RECOGNITION, 2020, 98