Spatial-temporal graph transformer network for skeleton-based temporal action segmentation

被引：2

作者：

Tian, Xiaoyan ^{[1
]}

Jin, Ye ^{[1
]}

Zhang, Zhao ^{[2
]}

Liu, Peng ^{[1
]}

Tang, Xianglong ^{[1
]}

机构：

[1] Harbin Inst Technol, Fac Comp, 92 West Da Zhi St, Harbin 150001, Peoples R China

[2] Harbin Inst Technol, Sch Instrument Sci & Engn, 92 West Da Zhi St, Harbin 150001, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 83卷 / 15期

基金：

黑龙江省自然科学基金; 中国国家自然科学基金;

关键词：

Skeleton-based temporal action segmentation; Spatial-temporal graph; Transformer; Spatial-temporal correlation; Over-segmentation errors; Ambiguous boundaries;

D O I：

10.1007/s11042-023-17276-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Temporal action segmentation (TAS) of minute-long untrimmed videos involves locating and classifying human action segments using multiple action class labels. Previously, research on this task typically involved generating an initial estimate using designed temporal convolutional layers and gradually refining this estimate solely based on RGB features. This approach, however, exhibits several limitations, including the inability to capture inherent long-range dependencies and insufficient consideration of intricate spatial-temporal correlations in the changing relationships between human joints. To address these constraints, we introduce a novel spatial-temporal graph transformer network (STGT) for the skeleton-based TAS task. Our STGT employs a series of skeleton graph transformer blocks (SGT blocks) within an encoder-decoder architecture. Particularly, the spatial-temporal graph layer with an adaptive graph strategy enhances the graph structure, rendering it more flexible and robust. Additionally, the spatial-temporal transformer layer in the SGT block constructs parallel attention mechanisms to model the dynamic spatial and non-linear temporal correlations. Integrating these advancements into the TAS task represents a notable achievement. Experimental results on three challenging datasets (PKU-MMD, HuGaDB, and LARa) indicate the improved performance of the proposed framework compared with that of existing TAS models (MS-TCN, ASRF, BCN, ETSN, and ASFormer). Furthermore, our approach effectively addresses concerns regarding over-segmentation errors and ambiguous boundaries.

引用

页码：44273 / 44297

页数：25

共 50 条

[1] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
Xiaoyan Tian
Ye Jin
Zhao Zhang
Peng Liu
Xianglong Tang
Multimedia Tools and Applications, 2024, 83 : 44273 - 44297
[2] Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation
Tan, Chenwei
Sun, Tao
Fu, Talas
Wang, Yuhan
Xu, Minjie
Liu, Shenglan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 28 - 39
[3] Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition
Chen, Shuo
Xu, Ke
Jiang, Xinghao
Sun, Tanfeng
APPLIED SCIENCES-BASEL, 2022, 12 (18):
[4] Spatial-Temporal gated graph attention network for skeleton-based action recognition
Rahevar, Mrugendrasinh
Ganatra, Amit
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939
[5] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
Rahevar, Mrugendrasinh
Ganatra, Amit
Saba, Tanzila
Rehman, Amjad
Bahaj, Saeed Ali
IEEE ACCESS, 2023, 11 : 21546 - 21553
[6] Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition
Fang, Zheng
Zhang, Xiongwei
Cao, Tieyong
Zheng, Yunfei
Sun, Meng
IET COMPUTER VISION, 2022, 16 (03) : 205 - 217
[7] Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
Hang, Rui
Li, MinXian
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 172 - 188
[8] Dynamic spatial-temporal topology graph network for skeleton-based action recognition
Chen, Lian
Lu, Ke
Niu, Zehai
Wei, Runchen
Xue, Jian
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[9] Multilevel Spatial-Temporal Excited Graph Network for Skeleton-Based Action Recognition
Zhu, Yisheng
Shuai, Hui
Liu, Guangcan
Liu, Qingshan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 496 - 508
[10] A motion-aware and temporal-enhanced Spatial-Temporal Graph Convolutional Network for skeleton-based human action segmentation
Chai, Shurong
Jain, Rahul Kumar
Liu, Jiaqing
Teng, Shiyu
Tateyama, Tomoko
Li, Yinhao
Chen, Yen -Wei
NEUROCOMPUTING, 2024, 580

← 1 2 3 4 5 →