Research on 3D Multi-Branch Aggregated Lightweight Network Video Action Recognition Algorithm

被引：0

作者：

Hu Z.-P. ^{[1
,2
]}

Diao P.-C. ^{[1
]}

Zhang R.-X. ^{[1
]}

Li S.-F. ^{[1
]}

Zhao M.-Y. ^{[1
]}

机构：

[1] School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, Hebei

[2] Hebei Key Laboratory of Information Transmission and Signal Processing, Yanshan University, Qinhuangdao, 066004, Hebei

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2020年 / 48卷 / 07期

关键词：

Action recognition; Deep learning; Neural network;

D O I：

10.3969/j.issn.0372-2112.2020.07.003

中图分类号：

学科分类号：

摘要：

To construct a video action recognition model with 2D neural network speed while maintaining the performance of 3D neural network, the 3D multi-branch aggregation lightweight network action recognition algorithm is proposed.Firstly, the neural network is divided into multiple branches by using grouped convolution.Secondly, to promote the information exchange between branches, a multiplexer module with information aggregation function is added.Finally, the adaptive attention mechanism is introduced to redirect channel and spatio-temporal information.Experiments show that, the computational cost of the algorithm on the UCF101 dataset is 11.5GFlops, and the accuracy is 96.2%; the computational cost on the HMDB51 dataset is 11.5GFlops, and the accuracy is 74.7%.Compared with other action recognition algorithms, it improves the efficiency of the video recognition network and reflects certain recognition speed and accuracy advantages. © 2020, Chinese Institute of Electronics. All right reserved.

引用

页码：1261 / 1268

页数：7

共 25 条

[1] LUO Hui-lan, WANG Chan-juan, An improved VLAD coding method based on fusion feature in action recognition [J], Acta Electronica Sinica, 47, 1, pp. 49-58, (2019)
[2] ZHANG You-mei, CHANG Fa-liang, LIU Hong-bin, Action recognition based on 3D skeleton, Acta Electronica Sinica, 45, 4, pp. 906-911, (2017)
[3] LUO Hui-lan, TONG Kang, KONG Fan-sheng, The progress of human action recognition in videos based on deep learning:a view, Acta Electronica Sinica, 47, 5, pp. 1162-1173, (2019)
[4] Qiu Z, Yao T, Mei T., Learning spatio-temporal representation with pseudo-3d residual networks, Proceedings of the IEEE International Conference on Computer Vision, pp. 5533-5541, (2017)
[5] Xu H, Das A, Saenko K., R-c3d:Region convolutional 3d network for temporal activity detection, Proceedings of the IEEE International Conference on Computer Vision, pp. 5783-5792, (2017)
[6] Wang X, Girshick R, Gupta A, Et al., Non-local neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, (2018)
[7] Simonyan K, Zisserman A., Very deep convolutional networks for large-scale image recognition, Computer Science, pp. 1549-1556, (2014)
[8] He K, Zhang X, Ren S, Et al., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[9] Xie S, Girshick R, Dollar P, Et al., Aggregated residual transformations for deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492-1500, (2017)
[10] Sandler M, Howard A, Zhu M, Et al., Mobilenetv2:Inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, (2018)

← 1 2 3 →