Semantic-guided multi-scale human skeleton action recognition

被引:8
作者
Qi, Yongfeng [1 ]
Hu, Jinlin [1 ]
Zhuang, Liqiang [1 ]
Pei, Xiaoxu [1 ]
机构
[1] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Gansu, Peoples R China
关键词
Human skeleton; Action recognition; Semantic information; Multi-scale neural network; Multi-scale receptive field; GRAPH CONVOLUTIONAL NETWORKS; LSTM; FUSION; GCN;
D O I
10.1007/s10489-022-03968-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of depth sensors and pose estimation algorithms, action recognition technology based on the human skeleton has attracted wide attention from researchers. The human skeleton action recognition methods embedded with semantic information have excellent performance in terms of computational cost and recognition results by extracting spatio-temporal features of all joints, nevertheless, they will cause information redundancy and are of limitations in extracting long-term context spatio-temporal features. In this work, we propose a semantic-guided multi-scale neural network (SGMSN) method for skeleton action recognition. For spatial modeling, the key insight of our approach is to achieve multi-scale graph convolution by manipulating the data level (without adding additional computational cost). For temporal modeling, we build the multi-scale temporal convolutional network with a multi-scale receptive field across the temporal dimensions. Several experiments have been carried out on two publicly available large-scale skeleton datasets, NTU RGB+D and NTU RGB+D 120. On the NTU RGB+D datasets, the accuracy is 90.1% (cross-subject) and 95.8% (cross-view) respectively. The experimental results show that the performance of the proposed network architecture is superior to most current state-of-the-art action recognition models.
引用
收藏
页码:9763 / 9778
页数:16
相关论文
共 65 条
[31]  
Liao RJ, 2019, Arxiv, DOI arXiv:1901.01484
[32]   NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding [J].
Liu, Jun ;
Shahroudy, Amir ;
Perez, Mauricio ;
Wang, Gang ;
Duan, Ling-Yu ;
Kot, Alex C. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2684-2701
[33]   A Multi-Stream Graph Convolutional Networks-Hidden Conditional Random Field Model for Skeleton-Based Action Recognition [J].
Liu, Kai ;
Gao, Lei ;
Khan, Naimul Mefraz ;
Qi, Lin ;
Guan, Ling .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :64-76
[34]   Adaptive multi-view graph convolutional networks for skeleton-based action recognition [J].
Liu, Xing ;
Li, Yanshan ;
Xia, Rongjie .
NEUROCOMPUTING, 2021, 444 :288-300
[35]  
Liu Y., 2021, arXiv
[36]   Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition [J].
Liu, Ziyu ;
Zhang, Hongwen ;
Chen, Zhenghao ;
Wang, Zhiyong ;
Ouyang, Wanli .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :140-149
[37]   Learning representations from quadrilateral based geometric features for skeleton-based action recognition using LSTM networks [J].
Naveenkumar, M. ;
Domnic, S. .
INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2020, 14 (01) :47-54
[38]   Tripool: Graph triplet pooling for 3D skeleton-based action recognition [J].
Peng, Wei ;
Hong, Xiaopeng ;
Zhao, Guoying .
PATTERN RECOGNITION, 2021, 115
[39]  
Peng W, 2020, AAAI CONF ARTIF INTE, V34, P2669
[40]   Skeleton-based action recognition via spatial and temporal transformer networks [J].
Plizzari, Chiara ;
Cannici, Marco ;
Matteucci, Matteo .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 208 (208-209)