Multimodal human action recognition based on spatio-temporal action representation recognition model

被引:8
作者
Wu, Qianhan [1 ,2 ]
Huang, Qian [1 ,2 ]
Li, Xing [1 ,2 ]
机构
[1] Hohai Univ, Key Lab Water Big Data Technol, Minist Water Resources, 8 West Focheng Rd, Nanjing 211106, Jiangsu, Peoples R China
[2] Hohai Univ, Sch Comp & Informat, 8 West Focheng Rd, Nanjing 211106, Jiangsu, Peoples R China
关键词
Human action recognition; Multimode learning; HP-DMI; ST-GCN extractor; HTMCCA; CONVOLUTIONAL NEURAL-NETWORKS; RGB-D; DESCRIPTOR; MOTION; VIDEOS; CNN;
D O I
10.1007/s11042-022-14193-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition methods based on single-modal data lack adequate information. It is necessary to propose the methods based on multimodal data and the fusion algorithms to fuse different features. Meanwhile, the existing features extracted from depth videos and skeleton sequences are not representative. In this paper, we propose a new model named Spatio-temporal Action Representation Recognition Model for recognizing human actions. This model proposes a new depth feature map called Hierarchical Pyramid Depth Motion Images (HP-DMI) to represent depth videos and adopts Spatial-temporal Graph Convolutional Networks (ST-GCN) extractor to summarize skeleton features named Spatio-temporal Joint Descriptors (STJD). Histogram of Oriented Gradient (HOG) is used on HP-DMI to extract HP-DMI-HOG features. Then two kinds of features are input into a fusion algorithm High Trust Mean Canonical correlation analysis (HTMCCA). HTMCCA mitigates the impact of noisy samples on multi-feature fusion and reduces computational complexity. Finally, Support Vector Machine (SVM) is used for human action recognition. To evaluate the performance of our approach, several experiments are conducted on two public datasets. Eexperiments results prove its effectiveness.
引用
收藏
页码:16409 / 16430
页数:22
相关论文
共 50 条
[31]   Localized Temporal Representation in Human Action Recognition [J].
Han, Pang Ying ;
Yee, Khor Ean ;
Yin, Ooi Shih .
PROCEEDINGS OF 2018 VII INTERNATIONAL CONFERENCE ON NETWORK, COMMUNICATION AND COMPUTING (ICNCC 2018), 2018, :261-266
[32]   Three-dimensional spatio-temporal trajectory descriptor for human action recognition [J].
Bhorge, Sidharth B. ;
Manthalkar, Ramachandra R. .
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (03) :197-205
[33]   Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks [J].
Wang, Lei ;
Xu, Yangyang ;
Cheng, Jun ;
Xia, Haiying ;
Yin, Jianqin ;
Wu, Jiaji .
IEEE ACCESS, 2018, 6 :17913-17922
[34]   ATOMIC HUMAN ACTION SEGMENTATION AND RECOGNITION USING A SPATIO-TEMPORAL PROBABILISTIC FRAMEWORK [J].
Chen, Duan-Yu ;
Liao, Hong-Yuan Mark ;
Shih, Sheng-Wen .
INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (02) :205-220
[35]   Depthwise Spatio-Temporal STFT Convolutiona Neural Networks for Human Action Recognition [J].
Kumawat, Sudhakar ;
Verma, Manisha ;
Nakashima, Yuta ;
Raman, Shanmuganathan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) :4839-4851
[36]   Three-dimensional spatio-temporal trajectory descriptor for human action recognition [J].
Sidharth B. Bhorge ;
Ramachandra R. Manthalkar .
International Journal of Multimedia Information Retrieval, 2018, 7 :197-205
[37]   Human action recognition using Spatio-temporal Histogram of Structure Tensors descriptor [J].
Abdelhedi, Slim ;
Wali, Ali ;
Alimi, Adel M. .
JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2019, 14 (03) :78-85
[38]   PROGRESSIVE SPATIO-TEMPORAL GRAPH CONVOLUTIONAL NETWORK FOR SKELETON-BASED HUMAN ACTION RECOGNITION [J].
Heidari, Negar ;
Iosifidis, Alexandros .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :3220-3224
[39]   A Spatio-Temporal Convolutional Neural Network for Skeletal Action Recognition [J].
Hu, Lizhang ;
Xu, Jinhua .
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 :377-385
[40]   REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY [J].
Tim, Stefen Chan Wai ;
Rombaut, Michele ;
Pellerin, Denis .
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, :1133-1137