Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network

被引：7

作者：

Zhang, Haiping ^{[1
,2
]}

Liu, Xu ^{[3
]}

Yu, Dongjin ^{[1
]}

Guan, Liming ^{[2
]}

Wang, Dongjing ^{[1
]}

Ma, Conghao ^{[3
]}

Hu, Zepeng ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Zhejiang, Peoples R China

[2] Hangzhou Dianzi Univ, Sch Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China

[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Zhejiang, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 14期

关键词：

Action recognition; Skeleton; GCN; Multi-stream network; VISUAL SURVEILLANCE;

D O I：

10.1007/s10489-022-04365-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action recognition techniques based on skeleton data are receiving more and more attention in the field of computer vision due to their ability to adapt to dynamic environments and complex backgrounds. Topologizing human skeleton data as spatial-temporal graphs and processing them using graph convolutional networks (GCNs) has been shown to produce good recognition results. However, with existing GCN methods, a fixed-size convolution kernel is often used to extract time-domain features, which may not be very suitable for multi-level model structures. Equal proportion fusion of different streams in a multi-stream network may ignore the difference in recognition ability of different streams, and these will affect the final recognition result. In this paper, we are proposing (1) a multi-scale dilated temporal graph convolution layer (MDTGCL) and (2) a multi-branch feature fusion (MFF) structure. The MDTGCL utilizes multiple convolution kernels and dilated convolution to better adapt to the multi-layer structure of the GCN model and to obtain longer periods of contextual spatial-temporal information, resulting in richer behavioural features. MFF entails weighted fusion based on the results of multi-stream outputs, and this is used to obtain the final recognition results. As higher-order skeleton data are highly discriminative and more conducive to human action recognition, we used spatial information on joints and bones and their multiple motion, as well as angle information pertaining to bones, to model together in this study. By combining the above, we designed a multi-stream, multi-scale dilated spatial-temporal graph convolutional network (2M-STGCN) model and conducted extensive experiments with two large datasets (NTU RGB+D 60 and Kinetics Skeleton 400), which showed that our model performs at SOTA level.

引用

页码：17629 / 17643

页数：15

共 50 条

[21] Skeleton Action Recognition Based on Multi-Stream Spatial Attention Graph Convolutional SRU Network
Zhao J.-N.
She Q.-S.
Meng M.
Chen Y.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1579 - 1585
[22] Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
Hang, Rui
Li, MinXian
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 172 - 188
[23] Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition
Zheng, Zhiyun
Wang, Yizhou
Zhang, Xingjin
Wang, Junfeng
APPLIED SCIENCES-BASEL, 2022, 12 (03):
[24] Hierarchical adaptive multi-scale hypergraph attention convolution network for skeleton-based action recognition
Yang, Honghong
Wang, Sai
Jiang, Lu
Su, Yuping
Zhang, Yumei
APPLIED SOFT COMPUTING, 2025, 172
[25] MTT: Multi-Scale Temporal Transformer for Skeleton-Based Action Recognition
Kong, Jun
Bian, Yuhang
Jiang, Min
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 528 - 532
[26] Lighter and faster: A multi-scale adaptive graph convolutional network for skeleton-based action recognition
Jiang, Yuanjian
Deng, Hongmin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
[27] Channel attention and multi-scale graph neural networks for skeleton-based action recognition
Dang, Ronghao
Liu, Chengju
Liu, Ming
Chen, Qijun
AI COMMUNICATIONS, 2022, 35 (03) : 187 - 205
[28] A Multi-Stream Graph Convolutional Networks-Hidden Conditional Random Field Model for Skeleton-Based Action Recognition
Liu, Kai
Gao, Lei
Khan, Naimul Mefraz
Qi, Lin
Guan, Ling
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 64 - 76
[29] Multi-stream Global-Local Motion Fusion Network for skeleton-based action recognition
Qi, Yanpeng
Pang, Chen
Liu, Yiliang
Lyu, Lei
APPLIED SOFT COMPUTING, 2023, 145
[30] Multi-temporal scale aggregation refinement graph convolutional network for skeleton-based action recognition
Li, Xuanfeng
Lu, Jian
Zhou, Jian
Liu, Wei
Zhang, Kaibing
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (01)

← 1 2 3 4 5 →