Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model

被引:1
作者
Pan Na [1 ]
Jiang Min [1 ]
Kong Jun [1 ]
机构
[1] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Jiangsu, Peoples R China
关键词
machine vision; action recognition; two-stream network; attention; deep learning; interaction; SPATIAL-TEMPORAL ATTENTION; VIDEO;
D O I
10.3788/LOP57.181506
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A human action recognition algorithm is proposed based on spatio-temporal interactive attention model (STIAM) to solve the problem of low recognition accuracy. This problem is caused by the incapability of the two-stream network to effectively extract the valid frames in each video and the valid regions in each frame. Initially, the proposed algorithm applies two different deep learning networks to extract spatial and temporal features respectively. Subsequently, a mask-guided spatial attention model is designed to calculate the salient regions in each frame. Then, an optical flow-guided temporal attention model is designed to locate the saliency frames in each video. Finally, the weights obtained from temporal and spatial attention arc weighted respectively with spatial features and temporal features to make this model realize the spatio-temporal interaction. Compared with the existing methods on UCF101 and Penn Action datasets, the experimental results show that STIAM has high feature extraction performance and the accuracy of action recognition is obviously improved.
引用
收藏
页数:9
相关论文
共 27 条
[1]  
Cao C, 2016, P 25 INT JOINT C ART, P3324
[2]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[3]   Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos [J].
Du, Wenbin ;
Wang, Yali ;
Qiao, Yu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) :1347-1360
[4]   Inferring Shared Attention in Social Scene Videos [J].
Fan, Lifeng ;
Chen, Yixin ;
Wei, Ping ;
Wang, Wenguan ;
Zhu, Song-Chun .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6460-6468
[5]   End-to-End Learning of Motion Representation for Video Understanding [J].
Fan, Lijie ;
Huang, Wenbing ;
Gan, Chuang ;
Ermon, Stefano ;
Gong, Boqing ;
Huang, Junzhou .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6016-6025
[6]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[7]   Im2Flow: Motion Hallucination from Static Images for Action Recognition [J].
Gao, Ruohan ;
Xiong, Bo ;
Grauman, Kristen .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5937-5947
[8]   Video you only look once: Overall temporal convolutions for action recognition [J].
Jing, Longlong ;
Yang, Xiaodong ;
Tian, Yingli .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 52 :58-65
[9]  
Khowaja S A, 2019, NEURAL COMPUTING APP, P1
[10]  
Lan ZZ, 2015, PROC CVPR IEEE, P204, DOI 10.1109/CVPR.2015.7298616