Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation

被引:0
|
作者
Ji, Haoyu [1 ]
Chen, Bowen [1 ]
Xu, Xinglong [1 ]
Ren, Weihong [1 ]
Wang, Zhiyong [1 ]
Liu, Honghai [1 ]
机构
[1] Harbin Inst Technol, State Key Lab Robot & Syst, Shenzhen 518055, Guangdong, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Video Understanding; Skeleton-based Action Segmentation; Language-Assisted Learning; Attention; Contrastive Learning;
D O I
10.1007/978-3-031-72949-2_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based Temporal Action Segmentation (STAS) aims to densely segment and classify human actions in long, untrimmed skeletal motion sequences. Existing STAS methods primarily model spatial dependencies among joints and temporal relationships among frames to generate frame-level one-hot classifications. However, these methods overlook the deep mining of semantic relations among joints as well as actions at a linguistic level, which limits the comprehensiveness of skeleton action understanding. In this work, we propose a Language-assisted Skeleton Action Understanding (LaSA) method that leverages the language modality to assist in learning semantic relationships among joints and actions. Specifically, in terms of joint relationships, the Joint Relationships Establishment (JRE) module establishes correlations among joints in the feature sequence by applying attention between joint texts and differentiates distinct joints by embedding joint texts as positional embeddings. Regarding action relationships, the Action Relationships Supervision (ARS) module enhances the discrimination across action classes through contrastive learning of single-class action-text pairs and models the semantic associations of adjacent actions by contrasting mixed-class clip-text pairs. Performance evaluation on five public datasets demonstrates that LaSA achieves state-of-the-art results. Code is available at https://github.com/HaoyuJi/LaSA.
引用
收藏
页码:400 / 417
页数:18
相关论文
共 50 条
  • [1] Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation
    Tan, Chenwei
    Sun, Tao
    Fu, Talas
    Wang, Yuhan
    Xu, Minjie
    Liu, Shenglan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 28 - 39
  • [2] LAC - Latent Action Composition for Skeleton-based Action Segmentation
    Yang, Di
    Wang, Yaohui
    Dantcheva, Antitza
    Kong, Quan
    Garattoni, Lorenzo
    Francesca, Gianpiero
    Bremond, Francois
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13633 - 13644
  • [3] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Xiaoyan Tian
    Ye Jin
    Zhao Zhang
    Peng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 44273 - 44297
  • [4] Involving Distinguished Temporal Graph Convolutional Networks for Skeleton-Based Temporal Action Segmentation
    Li, Yun-Heng
    Liu, Kai-Yuan
    Liu, Sheng-Lan
    Feng, Lin
    Qiao, Hong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 647 - 660
  • [5] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Zhang, Zhao
    Liu, Peng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44273 - 44297
  • [6] An efficient framework for few-shot skeleton-based temporal action segmentation
    Xu, Leiyang
    Wang, Qiang
    Lin, Xiaotian
    Yuan, Lin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 232
  • [7] Temporal Extension Module for Skeleton-Based Action Recognition
    Obinata, Yuya
    Yamamoto, Takuma
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 534 - 540
  • [8] Language-guided temporal primitive modeling for skeleton-based action recognition
    Pan, Qingzhe
    Xie, Xuemei
    NEUROCOMPUTING, 2025, 613
  • [9] Multi-Grained Temporal Segmentation Attention Modeling for Skeleton-Based Action Recognition
    Lv, Jinrong
    Gong, Xun
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 927 - 931
  • [10] Sequence Segmentation Attention Network for Skeleton-Based Action Recognition
    Zhang, Yujie
    Cai, Haibin
    ELECTRONICS, 2023, 12 (07)