Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

被引:0
作者
He, Zhiquan [1 ,2 ]
Lv, Jiantu [2 ]
Fang, Shizhang [2 ]
机构
[1] Guangdong Key Lab Intelligent Informat Proc, Shenzhen, Peoples R China
[2] Shenzhen Univ, Guangdong Multimedia Informat Serv Engn Technol Re, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Contrastive learning; Action recognition;
D O I
10.1016/j.neucom.2024.127495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition is one of the basic researches in computer vision. In recent years, the unsupervised contrastive learning paradigm has achieved great success in skeleton-based action recognition. However, previous work often treated input skeleton sequences as a whole when performing comparisons, lacking fine-grained representation contrast learning. Therefore, we propose a contrastive learning method for Representation Modeling with Multi-domain D ecoupling (RMMD), which extracts the most significant representations from input skeleton sequences in the temporal domain, spatial domain and frequency domain, respectively. Specifically, in the temporal and spatial domains, we propose a multi-level spatiotemporal mining reconstruction module (STMR) that iteratively reconstructs the original input skeleton sequences to highlight spatiotemporal representations under different actions. At the same time, we introduce position encoding and a global adaptive attention matrix, balancing both global and local information, and effectively modeling the spatiotemporal dependencies between joints. In the frequency domain, we use the discrete cosine transform (DCT) to achieve temporal-frequency conversion, discard part of the interference information, and use the frequency self-attention (FSA) and multi-level aggregation perceptron (MLAP) to deeply explore the frequency domain representation. The fusion of the temporal domain, spatial domain and frequency domain representations makes our model more discriminative in representing different actions. Besides, we verify the effectiveness of the model on the NTU RGB+D and PKU-MMD datasets. Extensive experiments show that our method outperforms existing unsupervised methods and achieves significant performance improvements in downstream tasks such as action recognition and action retrieval.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Dual-domain graph convolutional networks for skeleton-based action recognition
    Chen, Shuo
    Xu, Ke
    Mi, Zhongjie
    Jiang, Xinghao
    Sun, Tanfeng
    MACHINE LEARNING, 2022, 111 (07) : 2381 - 2406
  • [32] Dual-domain graph convolutional networks for skeleton-based action recognition
    Shuo Chen
    Ke Xu
    Zhongjie Mi
    Xinghao Jiang
    Tanfeng Sun
    Machine Learning, 2022, 111 : 2381 - 2406
  • [33] SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition
    Liu, Ruyi
    Liu, Yi
    Wu, Mengyao
    Xin, Wentian
    Miao, Qiguang
    Liu, Xiangzeng
    Lie, Long
    PATTERN RECOGNITION, 2025, 162
  • [34] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition
    Chen, Tailin
    Zhou, Desen
    Wang, Jian
    Wang, Shidong
    Guan, Yu
    He, Xuming
    Ding, Errui
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4334 - 4342
  • [35] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
    Xu, Zhuoyan
    Xu, Jingke
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [36] Multi-scale motion contrastive learning for self-supervised skeleton-based action recognition
    Wu, Yushan
    Xu, Zengmin
    Yuan, Mengwei
    Tang, Tianchi
    Meng, Ruxing
    Wang, Zhongyuan
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [37] Enhanced view-independent representation method for skeleton-based human action recognition
    Jiang Y.
    Lu L.
    Xu J.
    International Journal of Information and Communication Technology, 2021, 19 (02) : 201 - 218
  • [38] Multi-Relational Graph Convolutional Networks for Skeleton-Based Action Recognition
    Liu, Fang
    Dai, Qin
    Wang, Shengze
    Zhao, Liang
    Shi, Xiangbin
    Qiao, Jianzhong
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 474 - 480
  • [39] Hard Sample Mining and Learning for Skeleton-Based Human Action Recognition and Identification
    Cui, Ran
    Hua, Gang
    Zhu, Aichun
    Wu, Jingran
    Liu, Haiqiang
    IEEE ACCESS, 2019, 7 : 8245 - 8257
  • [40] CdCLR: Clip- Driven Contrastive Learning for Skeleton-Based Action Recognition
    Gao, Rong
    Liu, Xin
    Yang, Jingyu
    Yue, Huanjing
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,