Multimodal human action recognition based on spatio-temporal action representation recognition model

被引:8
作者
Wu, Qianhan [1 ,2 ]
Huang, Qian [1 ,2 ]
Li, Xing [1 ,2 ]
机构
[1] Hohai Univ, Key Lab Water Big Data Technol, Minist Water Resources, 8 West Focheng Rd, Nanjing 211106, Jiangsu, Peoples R China
[2] Hohai Univ, Sch Comp & Informat, 8 West Focheng Rd, Nanjing 211106, Jiangsu, Peoples R China
关键词
Human action recognition; Multimode learning; HP-DMI; ST-GCN extractor; HTMCCA; CONVOLUTIONAL NEURAL-NETWORKS; RGB-D; DESCRIPTOR; MOTION; VIDEOS; CNN;
D O I
10.1007/s11042-022-14193-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition methods based on single-modal data lack adequate information. It is necessary to propose the methods based on multimodal data and the fusion algorithms to fuse different features. Meanwhile, the existing features extracted from depth videos and skeleton sequences are not representative. In this paper, we propose a new model named Spatio-temporal Action Representation Recognition Model for recognizing human actions. This model proposes a new depth feature map called Hierarchical Pyramid Depth Motion Images (HP-DMI) to represent depth videos and adopts Spatial-temporal Graph Convolutional Networks (ST-GCN) extractor to summarize skeleton features named Spatio-temporal Joint Descriptors (STJD). Histogram of Oriented Gradient (HOG) is used on HP-DMI to extract HP-DMI-HOG features. Then two kinds of features are input into a fusion algorithm High Trust Mean Canonical correlation analysis (HTMCCA). HTMCCA mitigates the impact of noisy samples on multi-feature fusion and reduces computational complexity. Finally, Support Vector Machine (SVM) is used for human action recognition. To evaluate the performance of our approach, several experiments are conducted on two public datasets. Eexperiments results prove its effectiveness.
引用
收藏
页码:16409 / 16430
页数:22
相关论文
共 50 条
[21]   Exploring hybrid spatio-temporal convolutional networks for human action recognition [J].
Wang, Hao ;
Yang, Yanhua ;
Yang, Erkun ;
Deng, Cheng .
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (13) :15065-15081
[22]   Silhouette analysis for human action recognition based on maximum spatio-temporal dissimilarity embedding [J].
Jian Cheng ;
Haijun Liu ;
Hongsheng Li .
Machine Vision and Applications, 2014, 25 :1007-1018
[23]   Projection transform on spatio-temporal context for action recognition [J].
Wanru Xu ;
Zhenjiang Miao ;
Qiang Zhang .
Multimedia Tools and Applications, 2015, 74 :7711-7728
[24]   Graph-based approach for human action recognition using spatio-temporal features [J].
Ben Aoun, Najib ;
Mejdoub, Mahmoud ;
Ben Amar, Chokri .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (02) :329-338
[25]   Human Action Recognition Based on Selected Spatio-Temporal Features via Bidirectional LSTM [J].
Li, Wenhui ;
Nie, Weizhi ;
Su, Yuting .
IEEE ACCESS, 2018, 6 :44211-44220
[26]   Projection transform on spatio-temporal context for action recognition [J].
Xu, Wanru ;
Miao, Zhenjiang ;
Zhang, Qiang .
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (18) :7711-7728
[27]   Silhouette analysis for human action recognition based on maximum spatio-temporal dissimilarity embedding [J].
Cheng, Jian ;
Liu, Haijun ;
Li, Hongsheng .
MACHINE VISION AND APPLICATIONS, 2014, 25 (04) :1007-1018
[28]   Spatio-Temporal Attention Networks for Action Recognition and Detection [J].
Li, Jun ;
Liu, Xianglong ;
Zhang, Wenxuan ;
Zhang, Mingyuan ;
Song, Jingkuan ;
Sebe, Nicu .
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) :2990-3001
[29]   Action Recognition Based on Histogram of Spatio-Temporal Oriented Principal Components [J].
Xu Haiyang ;
Kong Jun ;
Jiang Min ;
Zan Baofeng .
LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (06)
[30]   Skeleton-Based Human Action Recognition through Third-Order Tensor Representation and Spatio-Temporal Analysis [J].
Barmpoutis, Panagiotis ;
Stathaki, Tania ;
Camarinopoulos, Stephanos .
INVENTIONS, 2019, 4 (01)