Human Action Recognition Method Based on Video-Level Features and Attention Mechanism

被引:0
作者
Cai, Qiang [1 ,2 ,3 ]
Yan, Jin [1 ,2 ,3 ]
Li, Haisheng [1 ,2 ,3 ]
Deng, Yibiao [1 ,2 ,3 ]
机构
[1] Beijing Technol & Business Univ, Beijing 100048, Peoples R China
[2] Beijing Key Lab Big Data Technol Food Safety, Beijing, Peoples R China
[3] Natl Engn Lab Agri Product Qual Traceabil, Beijing, Peoples R China
来源
PROCEEDINGS OF 2020 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL I | 2021年 / 705卷
基金
北京市自然科学基金;
关键词
Two-stream network; Snippet-level features; Video-level features; Attention mechanism; Action recognition;
D O I
10.1007/978-981-15-8450-3_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to capture the spatiotemporal information in the video and improve the long-term modeling capability of the network, two-stream network usually adopts the method of sparse sampling. However, there are two problems in the process of feature extraction on the sample. One is the unreasonableness of the snippet-level features corresponding to the video labels, and the other is that the salient features are not highlighted. In view of the above two points, we propose an action recognition method based on video-level features and attention mechanism (VFAM), which combines snippet-level features to generate video-level features, and adds attention mechanism to give effective features with greater weight, and good experimental results have been achieved on the dataset HMDB51, reflecting the superiority and robustness of our method.
引用
收藏
页码:225 / 233
页数:9
相关论文
共 24 条
[11]  
Garcia N.C., 2018, Modality distillation with multiple stream networks for action recognition, P106
[12]   Squeeze-and-Excitation Networks [J].
Hu, Jie ;
Shen, Li ;
Albanie, Samuel ;
Sun, Gang ;
Wu, Enhua .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) :2011-2023
[13]   3D Convolutional Neural Networks for Human Action Recognition [J].
Ji, Shuiwang ;
Xu, Wei ;
Yang, Ming ;
Yu, Kai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231
[14]   Collaborative Spatiotemporal Feature Learning for Video Action Recognition [J].
Li, Chao ;
Zhong, Qiaoyong ;
Xie, Di ;
Pu, Shiliang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7864-7873
[15]   Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [J].
Qiu, Zhaofan ;
Yao, Ting ;
Mei, Tao .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5534-5542
[16]  
Simonyan K, 2014, ADV NEUR IN, V27
[17]  
Sun L., 2017, Lattice long short- term memory for human action recognition, P2166
[18]   A Closer Look at Spatiotemporal Convolutions for Action Recognition [J].
Tran, Du ;
Wang, Heng ;
Torresani, Lorenzo ;
Ray, Jamie ;
LeCun, Yann ;
Paluri, Manohar .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6450-6459
[19]  
Wang H., 2013, Action recognition with improved trajectories, P3551
[20]  
Wang L., 2015, Action recognition with trajectory-pooled deep-convolutional descriptors, P4305