Spatial-temporal graph transformer network for skeleton-based temporal action segmentation

被引:2
|
作者
Tian, Xiaoyan [1 ]
Jin, Ye [1 ]
Zhang, Zhao [2 ]
Liu, Peng [1 ]
Tang, Xianglong [1 ]
机构
[1] Harbin Inst Technol, Fac Comp, 92 West Da Zhi St, Harbin 150001, Peoples R China
[2] Harbin Inst Technol, Sch Instrument Sci & Engn, 92 West Da Zhi St, Harbin 150001, Peoples R China
基金
黑龙江省自然科学基金; 中国国家自然科学基金;
关键词
Skeleton-based temporal action segmentation; Spatial-temporal graph; Transformer; Spatial-temporal correlation; Over-segmentation errors; Ambiguous boundaries;
D O I
10.1007/s11042-023-17276-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal action segmentation (TAS) of minute-long untrimmed videos involves locating and classifying human action segments using multiple action class labels. Previously, research on this task typically involved generating an initial estimate using designed temporal convolutional layers and gradually refining this estimate solely based on RGB features. This approach, however, exhibits several limitations, including the inability to capture inherent long-range dependencies and insufficient consideration of intricate spatial-temporal correlations in the changing relationships between human joints. To address these constraints, we introduce a novel spatial-temporal graph transformer network (STGT) for the skeleton-based TAS task. Our STGT employs a series of skeleton graph transformer blocks (SGT blocks) within an encoder-decoder architecture. Particularly, the spatial-temporal graph layer with an adaptive graph strategy enhances the graph structure, rendering it more flexible and robust. Additionally, the spatial-temporal transformer layer in the SGT block constructs parallel attention mechanisms to model the dynamic spatial and non-linear temporal correlations. Integrating these advancements into the TAS task represents a notable achievement. Experimental results on three challenging datasets (PKU-MMD, HuGaDB, and LARa) indicate the improved performance of the proposed framework compared with that of existing TAS models (MS-TCN, ASRF, BCN, ETSN, and ASFormer). Furthermore, our approach effectively addresses concerns regarding over-segmentation errors and ambiguous boundaries.
引用
收藏
页码:44273 / 44297
页数:25
相关论文
共 50 条
  • [1] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Xiaoyan Tian
    Ye Jin
    Zhao Zhang
    Peng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 44273 - 44297
  • [2] Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation
    Tan, Chenwei
    Sun, Tao
    Fu, Talas
    Wang, Yuhan
    Xu, Minjie
    Liu, Shenglan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 28 - 39
  • [3] Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition
    Chen, Shuo
    Xu, Ke
    Jiang, Xinghao
    Sun, Tanfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [4] Spatial-Temporal gated graph attention network for skeleton-based action recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 929 - 939
  • [5] Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition
    Rahevar, Mrugendrasinh
    Ganatra, Amit
    Saba, Tanzila
    Rehman, Amjad
    Bahaj, Saeed Ali
    IEEE ACCESS, 2023, 11 : 21546 - 21553
  • [6] Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition
    Fang, Zheng
    Zhang, Xiongwei
    Cao, Tieyong
    Zheng, Yunfei
    Sun, Meng
    IET COMPUTER VISION, 2022, 16 (03) : 205 - 217
  • [7] Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
    Hang, Rui
    Li, MinXian
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 172 - 188
  • [8] Dynamic spatial-temporal topology graph network for skeleton-based action recognition
    Chen, Lian
    Lu, Ke
    Niu, Zehai
    Wei, Runchen
    Xue, Jian
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [9] Multilevel Spatial-Temporal Excited Graph Network for Skeleton-Based Action Recognition
    Zhu, Yisheng
    Shuai, Hui
    Liu, Guangcan
    Liu, Qingshan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 496 - 508
  • [10] A motion-aware and temporal-enhanced Spatial-Temporal Graph Convolutional Network for skeleton-based human action segmentation
    Chai, Shurong
    Jain, Rahul Kumar
    Liu, Jiaqing
    Teng, Shiyu
    Tateyama, Tomoko
    Li, Yinhao
    Chen, Yen -Wei
    NEUROCOMPUTING, 2024, 580