3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引:0
作者
Yang, Jun [1 ,2 ]
Sun, Shulong [2 ]
Chen, Jiayue [1 ]
Xie, Haizhen [1 ]
Wang, Yan [1 ]
Yang, Zenglong [1 ]
机构
[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China
[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
基金
中国国家自然科学基金;
关键词
action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;
D O I
10.3390/app14167154
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.
引用
收藏
页数:13
相关论文
共 50 条
[21]   STA-CNN: Convolutional Spatial-Temporal Attention Learning for Action Recognition [J].
Yang, Hao ;
Yuan, Chunfeng ;
Zhang, Li ;
Sun, Yunda ;
Hu, Weiming ;
Maybank, Stephen J. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :5783-5793
[22]   Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly [J].
Huang, Jianfeng ;
Liu, Xiang ;
Hu, Huan ;
Tang, Shanghua ;
Li, Chenyang ;
Zhao, Shaoan ;
Lin, Yimin ;
Wang, Kai ;
Liu, Zhaoxiang ;
Lian, Shiguo .
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 :114-130
[23]   A Channel-Wise Spatial-Temporal Aggregation Network for Action Recognition [J].
Wang, Huafeng ;
Xia, Tao ;
Li, Hanlin ;
Gu, Xianfeng ;
Lv, Weifeng ;
Wang, Yuehai .
MATHEMATICS, 2021, 9 (24)
[24]   Spatial-temporal pyramid based Convolutional Neural Network for action recognition [J].
Zheng, Zhenxing ;
An, Gaoyun ;
Wu, Dapeng ;
Ruan, Qiuqi .
NEUROCOMPUTING, 2019, 358 :446-455
[25]   AR3D: Attention Residual 3D Network for Human Action Recognition [J].
Dong, Min ;
Fang, Zhenglin ;
Li, Yongfa ;
Bi, Sheng ;
Chen, Jiangcheng .
SENSORS, 2021, 21 (05) :1-15
[26]   Spatial-temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention [J].
Cao, Haiwen ;
Wu, Chunlei ;
Lu, Jing ;
Wu, Jie ;
Wang, Leiquan .
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) :1173-1180
[27]   Spatial-Temporal Neural Networks for Action Recognition [J].
Jing, Chao ;
Wei, Ping ;
Sun, Hongbin ;
Zheng, Nanning .
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 :619-627
[28]   Spatial-temporal pooling for action recognition in videos [J].
Wang, Jiaming ;
Shao, Zhenfeng ;
Huang, Xiao ;
Lu, Tao ;
Zhang, Ruiqian ;
Lv, Xianwei .
NEUROCOMPUTING, 2021, 451 :265-278
[29]   Spatial-temporal interaction module for action recognition [J].
Luo, Hui-Lan ;
Chen, Han ;
Cheung, Yiu-Ming ;
Yu, Yawei .
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
[30]   Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer [J].
Yucai Bai ;
Qin Zou ;
Xieyuanli Chen ;
Lingxi Li ;
Zhengming Ding ;
Long Chen .
International Journal of Computer Vision, 2023, 131 :1550-1565