Multi-scale residual network model combined with Global Average Pooling for action recognition

被引:0
作者
Jianjun Li
Yu Han
Ming Zhang
Gang Li
Baohua Zhang
机构
[1] Inner Mongolia University of Science & Technology,School of Electronic and Information Engineering
来源
Multimedia Tools and Applications | 2022年 / 81卷
关键词
Multi-scale; Global average pooling; Residual network; Interaction recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Human Action Recognition is a research hotspot in the field of computer vision. However, due to the complexity of the environment and the diversity of actions, Human Action Recognition still faces many challenges. At the same time, traditional CNN has problems such as single feature scale, decreased accuracy of deep network, and excessive network parameters. Aiming at the above research problems, this paper proposes a novel residual network model based on Multi-scale Feature Fusion and Global Average Pooling. The model uses a Multi-scale Feature Fusion module to extract feature information of different scales, enriches spatial-time information. At the end of the network, Global Average Pooling is used to instead of a Fully Connected layer. Compared with a Fully Connected layer, Global Average Pooling will dilute the combination of the relative positions of different features. Therefore, the features trained by convolution are more effective. In addition, Global Average Pooling can realize direct mapping between output channels and feature categories to reduce excessive model parameters. The model in this paper is verified on the UT-interaction dataset, UCF11 (YouTube Action dataset), UCF101 dataset and CAVIAR dataset. The results show that compared with the state-of-the-art approaches, this approach has high recognition accuracy and excellent robustness, and has excellent performance on datasets with complex backgrounds and diverse action categories.
引用
收藏
页码:1375 / 1393
页数:18
相关论文
共 79 条
[1]  
Afsar P(2015)Automatic visual detection of human behavior: a review from 2000 to 2014 Expert Syst Appl 42 6935-6956
[2]  
Cortez P(2018)Abnormal behavior recognition for intelligent video surveillance systems: a review Expert Syst Appl 91 480-491
[3]  
Santos H(2013)Representation learning: a review and new perspectives IEEE Trans Pattern Anal Mach Intell 35 1798-1828
[4]  
Ben Mabrouk A(2019)EDCAR: a knowledge representation framework to enhance automatic video surveillance Expert Syst Appl 131 190-207
[5]  
Zagrouba E(2019)Interaction recognition based on improved sum product networks Comput Technol Dev 29 157-163
[6]  
Bengio Y(2005)A tutorial on v-support vector machines Appl Stoch Model Bus Ind 21 111-136
[7]  
Courville A(2014)Deep learning: methods and applications Found Trends Signal Process 7 3-4
[8]  
Vincent P(2019)Attention-based multiview re-observation fusion network for skeletal action recognition IEEE Trans Multimed 21 363-374
[9]  
Caruccio L(2019)Hyperspectral image classification based on 3D multi-scale feature fusion residual network Pattern Recognit Artif Intell 32 882-891
[10]  
Polese G(2012)Improving neural networks by preventing co-adaptation of feature detectors Comput Sci 3 212-223