Research on 3D Multi-Branch Aggregated Lightweight Network Video Action Recognition Algorithm

被引:0
作者
Hu Z.-P. [1 ,2 ]
Diao P.-C. [1 ]
Zhang R.-X. [1 ]
Li S.-F. [1 ]
Zhao M.-Y. [1 ]
机构
[1] School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, Hebei
[2] Hebei Key Laboratory of Information Transmission and Signal Processing, Yanshan University, Qinhuangdao, 066004, Hebei
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2020年 / 48卷 / 07期
关键词
Action recognition; Deep learning; Neural network;
D O I
10.3969/j.issn.0372-2112.2020.07.003
中图分类号
学科分类号
摘要
To construct a video action recognition model with 2D neural network speed while maintaining the performance of 3D neural network, the 3D multi-branch aggregation lightweight network action recognition algorithm is proposed.Firstly, the neural network is divided into multiple branches by using grouped convolution.Secondly, to promote the information exchange between branches, a multiplexer module with information aggregation function is added.Finally, the adaptive attention mechanism is introduced to redirect channel and spatio-temporal information.Experiments show that, the computational cost of the algorithm on the UCF101 dataset is 11.5GFlops, and the accuracy is 96.2%; the computational cost on the HMDB51 dataset is 11.5GFlops, and the accuracy is 74.7%.Compared with other action recognition algorithms, it improves the efficiency of the video recognition network and reflects certain recognition speed and accuracy advantages. © 2020, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:1261 / 1268
页数:7
相关论文
共 25 条
  • [1] LUO Hui-lan, WANG Chan-juan, An improved VLAD coding method based on fusion feature in action recognition [J], Acta Electronica Sinica, 47, 1, pp. 49-58, (2019)
  • [2] ZHANG You-mei, CHANG Fa-liang, LIU Hong-bin, Action recognition based on 3D skeleton, Acta Electronica Sinica, 45, 4, pp. 906-911, (2017)
  • [3] LUO Hui-lan, TONG Kang, KONG Fan-sheng, The progress of human action recognition in videos based on deep learning:a view, Acta Electronica Sinica, 47, 5, pp. 1162-1173, (2019)
  • [4] Qiu Z, Yao T, Mei T., Learning spatio-temporal representation with pseudo-3d residual networks, Proceedings of the IEEE International Conference on Computer Vision, pp. 5533-5541, (2017)
  • [5] Xu H, Das A, Saenko K., R-c3d:Region convolutional 3d network for temporal activity detection, Proceedings of the IEEE International Conference on Computer Vision, pp. 5783-5792, (2017)
  • [6] Wang X, Girshick R, Gupta A, Et al., Non-local neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, (2018)
  • [7] Simonyan K, Zisserman A., Very deep convolutional networks for large-scale image recognition, Computer Science, pp. 1549-1556, (2014)
  • [8] He K, Zhang X, Ren S, Et al., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
  • [9] Xie S, Girshick R, Dollar P, Et al., Aggregated residual transformations for deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492-1500, (2017)
  • [10] Sandler M, Howard A, Zhu M, Et al., Mobilenetv2:Inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, (2018)