Human Action Recognition Method Based on Video-Level Features and Attention Mechanism

被引:0
作者
Cai, Qiang [1 ,2 ,3 ]
Yan, Jin [1 ,2 ,3 ]
Li, Haisheng [1 ,2 ,3 ]
Deng, Yibiao [1 ,2 ,3 ]
机构
[1] Beijing Technol & Business Univ, Beijing 100048, Peoples R China
[2] Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China
[3] Natl Engn Lab Agri Product Qual Traceabil, Beijing, Peoples R China
来源
PROCEEDINGS OF 2020 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL I | 2021年 / 705卷
基金
北京市自然科学基金;
关键词
Two-stream network; Snippet-level features; Video-level features; Attention mechanism; Action recognition;
D O I
10.1007/978-981-15-8450-3_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to capture the spatiotemporal information in the video and improve the long-term modeling capability of the network, two-stream network usually adopts the method of sparse sampling. However, there are two problems in the process of feature extraction on the sample. One is the unreasonableness of the snippet-level features corresponding to the video labels, and the other is that the salient features are not highlighted. In view of the above two points, we propose an action recognition method based on video-level features and attention mechanism (VFAM), which combines snippet-level features to generate video-level features, and adds attention mechanism to give effective features with greater weight, and good experimental results have been achieved on the dataset HMDB51, reflecting the superiority and robustness of our method.
引用
收藏
页码:225 / 233
页数:9
相关论文
共 24 条
[1]   Action Recognition with Dynamic Image Networks [J].
Bilen, Hakan ;
Fernando, Basura ;
Gavves, Efstratios ;
Vedaldi, Andrea .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) :2799-2813
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]  
Crasto N., 2019, Mars: Motion-augmented RGB stream for action recognition, P7882
[4]  
Diba A., 2017, Computer Vision and Pattern Recognition
[5]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[6]  
Feichtenhofer C., 2016, Spatiotemporal residual networks for video action recognition, P3468
[7]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[8]  
Fernando B., 2016, Discriminative hierarchical rank pooling for activity recognition, P1924
[9]  
Fernando B, 2016, PR MACH LEARN RES, V48
[10]   Im2Flow: Motion Hallucination from Static Images for Action Recognition [J].
Gao, Ruohan ;
Xiong, Bo ;
Grauman, Kristen .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5937-5947