Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

被引:0
|
作者
He, Zhiquan [1 ,2 ]
Lv, Jiantu [2 ]
Fang, Shizhang [2 ]
机构
[1] Guangdong Key Lab Intelligent Informat Proc, Shenzhen, Peoples R China
[2] Shenzhen Univ, Guangdong Multimedia Informat Serv Engn Technol Re, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Contrastive learning; Action recognition;
D O I
10.1016/j.neucom.2024.127495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition is one of the basic researches in computer vision. In recent years, the unsupervised contrastive learning paradigm has achieved great success in skeleton-based action recognition. However, previous work often treated input skeleton sequences as a whole when performing comparisons, lacking fine-grained representation contrast learning. Therefore, we propose a contrastive learning method for Representation Modeling with Multi-domain D ecoupling (RMMD), which extracts the most significant representations from input skeleton sequences in the temporal domain, spatial domain and frequency domain, respectively. Specifically, in the temporal and spatial domains, we propose a multi-level spatiotemporal mining reconstruction module (STMR) that iteratively reconstructs the original input skeleton sequences to highlight spatiotemporal representations under different actions. At the same time, we introduce position encoding and a global adaptive attention matrix, balancing both global and local information, and effectively modeling the spatiotemporal dependencies between joints. In the frequency domain, we use the discrete cosine transform (DCT) to achieve temporal-frequency conversion, discard part of the interference information, and use the frequency self-attention (FSA) and multi-level aggregation perceptron (MLAP) to deeply explore the frequency domain representation. The fusion of the temporal domain, spatial domain and frequency domain representations makes our model more discriminative in representing different actions. Besides, we verify the effectiveness of the model on the NTU RGB+D and PKU-MMD datasets. Extensive experiments show that our method outperforms existing unsupervised methods and achieves significant performance improvements in downstream tasks such as action recognition and action retrieval.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
    Lin, Lilang
    Wu, Lehong
    Zhang, Jiahang
    Wang, Jiaying
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 75 - 92
  • [2] EnsCLR: Unsupervised skeleton-based action recognition via ensemble contrastive learning of representation
    Wang, Kun
    Cao, Jiuxin
    Cao, Biwei
    Liu, Bo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
  • [3] Hierarchical Contrast for Unsupervised Skeleton-Based Action Representation Learning
    Dong, Jianfeng
    Sun, Shengkai
    Liu, Zhonglin
    Chen, Shujie
    Liu, Baolong
    Wang, Xun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 525 - 533
  • [4] Bootstrapped Representation Learning for Skeleton-Based Action Recognition
    Moliner, Olivier
    Huang, Sangxia
    Astrom, Kalle
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4153 - 4163
  • [5] Progressive semantic learning for unsupervised skeleton-based action recognition
    Qin, Hao
    Chen, Luyuan
    Kong, Ming
    Zhao, Zhuoran
    Zeng, Xianzhou
    Lu, Mengxu
    Zhu, Qiang
    MACHINE LEARNING, 2025, 114 (03)
  • [6] Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding
    Sun, Shengkai
    Liu, Daizong
    Dong, Jianfeng
    Qu, Xiaoye
    Gao, Junyu
    Yang, Xun
    Wang, Xun
    Wang, Meng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2973 - 2984
  • [7] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition
    Chi, Hyung-gun
    Ha, Myoung Hoon
    Chi, Seunggeun
    Lee, Sang Wan
    Huang, Qixing
    Ramani, Karthik
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20154 - 20164
  • [8] Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition
    Du, Yong
    Fu, Yun
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3010 - 3022
  • [9] Unsupervised skeleton-based action representation learning via relation consistency pursuit
    Wenjing Zhang
    Yonghong Hou
    Haoyuan Zhang
    Neural Computing and Applications, 2022, 34 : 20327 - 20339
  • [10] Unsupervised skeleton-based action representation learning via relation consistency pursuit
    Zhang, Wenjing
    Hou, Yonghong
    Zhang, Haoyuan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 20327 - 20339