Language-guided temporal primitive modeling for skeleton-based action recognition

被引:0
|
作者
Pan, Qingzhe [1 ]
Xie, Xuemei [2 ,3 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Peoples R China
[2] Xidian Univ, Guangzhou Inst Technol, Guangzhou 510700, Peoples R China
[3] Pazhou Lab Huangpu, Guangzhou 510555, Peoples R China
基金
中国国家自然科学基金;
关键词
Skeleton-based action recognition; Temporal primitive; Language guidance; Large language model;
D O I
10.1016/j.neucom.2024.128636
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human actions describe the complex body dynamics, which are defined by an ordered set of sub-action primitives. Existing methods in skeleton-based action recognition primarily focus on designing various networks to learn the entire action representation, in alignment with the action label. However, training the heterogeneous action as a whole increases the burden on the network and is not conducive to learning the temporal structure of the action. In this paper, the traditional action label is extended into the ordered primitive language description by employing a large language model, which not only unveils the temporal structure of the action but also enriches its connotation. Based on this, we propose a language-guided skeleton representation learning network (LGS-Net) that leverages the language description to guide the skeleton feature learning from the ordered primitive and global action levels. Especially, the primitive guidance aligns the features of skeleton clips with those of the ordered primitive descriptions while preserving temporal order, which promotes the network to model the temporal primitive and capture the internal structure within the action from skeletons. To enhance the cross-modal alignment performance, we develop an innovative temporal alignment loss supplemented with diversity and sparsity regularization terms to generate the discriminative multi-modal representation. Evaluated on three benchmark datasets, NTU-60, NTU-120 and N-UCLA, the proposed LGS-Net achieves comparable results to state-of-the-art methods, which proves the effectiveness of the language-guided learning mechanism.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-cue temporal modeling for skeleton-based sign language recognition
    Ozdemir, Ogulcan
    Baytas, Inci M.
    Akarun, Lale
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [2] Temporal Extension Module for Skeleton-Based Action Recognition
    Obinata, Yuya
    Yamamoto, Takuma
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 534 - 540
  • [3] Combining Adaptive Graph Convolution and Temporal Modeling for Skeleton-Based Action Recognition
    Zhen, Haoyu
    Zhang, De
    Computer Engineering and Applications, 2023, 59 (18) : 137 - 144
  • [4] Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation
    Ji, Haoyu
    Chen, Bowen
    Xu, Xinglong
    Ren, Weihong
    Wang, Zhiyong
    Liu, Honghai
    COMPUTER VISION - ECCV 2024, PT LIV, 2025, 15112 : 400 - 417
  • [5] Multi-Grained Temporal Segmentation Attention Modeling for Skeleton-Based Action Recognition
    Lv, Jinrong
    Gong, Xun
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 927 - 931
  • [6] Unsupervised Temporal Adaptation in Skeleton-Based Human Action Recognition
    Tian, Haitao
    Payeur, Pierre
    ALGORITHMS, 2024, 17 (12)
  • [7] Motion Complement and Temporal Multifocusing for Skeleton-Based Action Recognition
    Wu, Cong
    Wu, Xiao-Jun
    Xu, Tianyang
    Shen, Zhongwei
    Kittler, Josef
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 34 - 45
  • [8] Leveraging uncertainty-guided spatial-temporal mutuality for skeleton-based action recognition
    Wu, Kunlun
    Peng, Bo
    Zhai, Donghai
    APPLIED SOFT COMPUTING, 2025, 171
  • [9] Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition
    Du, Yong
    Fu, Yun
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3010 - 3022
  • [10] Improved semantic-guided network for skeleton-based action recognition
    Mansouri, Amine
    Bakir, Toufik
    Elzaar, Abdellah
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104