Multimodal human action recognition based on spatio-temporal action representation recognition model

被引:8
作者
Wu, Qianhan [1 ,2 ]
Huang, Qian [1 ,2 ]
Li, Xing [1 ,2 ]
机构
[1] Hohai Univ, Key Lab Water Big Data Technol, Minist Water Resources, 8 West Focheng Rd, Nanjing 211106, Jiangsu, Peoples R China
[2] Hohai Univ, Sch Comp & Informat, 8 West Focheng Rd, Nanjing 211106, Jiangsu, Peoples R China
关键词
Human action recognition; Multimode learning; HP-DMI; ST-GCN extractor; HTMCCA; CONVOLUTIONAL NEURAL-NETWORKS; RGB-D; DESCRIPTOR; MOTION; VIDEOS; CNN;
D O I
10.1007/s11042-022-14193-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition methods based on single-modal data lack adequate information. It is necessary to propose the methods based on multimodal data and the fusion algorithms to fuse different features. Meanwhile, the existing features extracted from depth videos and skeleton sequences are not representative. In this paper, we propose a new model named Spatio-temporal Action Representation Recognition Model for recognizing human actions. This model proposes a new depth feature map called Hierarchical Pyramid Depth Motion Images (HP-DMI) to represent depth videos and adopts Spatial-temporal Graph Convolutional Networks (ST-GCN) extractor to summarize skeleton features named Spatio-temporal Joint Descriptors (STJD). Histogram of Oriented Gradient (HOG) is used on HP-DMI to extract HP-DMI-HOG features. Then two kinds of features are input into a fusion algorithm High Trust Mean Canonical correlation analysis (HTMCCA). HTMCCA mitigates the impact of noisy samples on multi-feature fusion and reduces computational complexity. Finally, Support Vector Machine (SVM) is used for human action recognition. To evaluate the performance of our approach, several experiments are conducted on two public datasets. Eexperiments results prove its effectiveness.
引用
收藏
页码:16409 / 16430
页数:22
相关论文
共 50 条
[41]   An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data [J].
Song, Sijie ;
Lan, Cuiling ;
Xing, Junliang ;
Zeng, Wenjun ;
Liu, Jiaying .
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, :4263-4270
[42]   A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector [J].
Das Dawn, Debapratim ;
Shaikh, Soharab Hossain .
VISUAL COMPUTER, 2016, 32 (03) :289-306
[43]   LEARNED SPATIO-TEMPORAL TEXTURE DESCRIPTORS FOR RGB-D HUMAN ACTION RECOGNITION [J].
Zhai, Zhengyuan ;
Fan, Chunxiao ;
Ming, Yue .
COMPUTING AND INFORMATICS, 2018, 37 (06) :1339-1362
[44]   A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector [J].
Debapratim Das Dawn ;
Soharab Hossain Shaikh .
The Visual Computer, 2016, 32 :289-306
[45]   Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition [J].
R. Divya Rani ;
C. J. Prabhakar .
Human-Centric Intelligent Systems, 2025, 5 (1) :123-150
[46]   Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data [J].
Han, Yun ;
Chung, Sheng-Luen ;
Xiao, Qiang ;
Lin, Wei You ;
Su, Shun-Feng .
IEEE ACCESS, 2020, 8 :88604-88616
[47]   Spatio-temporal neural network with handcrafted features for skeleton-based action recognition [J].
Nan, Mihai ;
Trascau, Mihai ;
Florea, Adina-Magda .
NEURAL COMPUTING & APPLICATIONS, 2024, :9221-9243
[48]   Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition [J].
Ahmad, Tasweer ;
Rizvi, Syed Tahir Hussain ;
Kanwal, Neel .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
[49]   Multimodal vision-based human action recognition using deep learning: a review [J].
Shafizadegan, Fatemeh ;
Naghsh-Nilchi, Ahmad R. ;
Shabaninia, Elham .
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (07)
[50]   ADVANCES ON ACTION RECOGNITION IN VIDEOS USING AN INTEREST POINT DETECTOR BASED ON MULTIBAND SPATIO-TEMPORAL ENERGIES [J].
Maninis, Kevis ;
Koutras, Petros ;
Maragos, Petros .
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, :1490-1494