Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

被引:0
|
作者
He, Zhiquan [1 ,2 ]
Lv, Jiantu [2 ]
Fang, Shizhang [2 ]
机构
[1] Guangdong Key Lab Intelligent Informat Proc, Shenzhen, Peoples R China
[2] Shenzhen Univ, Guangdong Multimedia Informat Serv Engn Technol Re, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Contrastive learning; Action recognition;
D O I
10.1016/j.neucom.2024.127495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition is one of the basic researches in computer vision. In recent years, the unsupervised contrastive learning paradigm has achieved great success in skeleton-based action recognition. However, previous work often treated input skeleton sequences as a whole when performing comparisons, lacking fine-grained representation contrast learning. Therefore, we propose a contrastive learning method for Representation Modeling with Multi-domain D ecoupling (RMMD), which extracts the most significant representations from input skeleton sequences in the temporal domain, spatial domain and frequency domain, respectively. Specifically, in the temporal and spatial domains, we propose a multi-level spatiotemporal mining reconstruction module (STMR) that iteratively reconstructs the original input skeleton sequences to highlight spatiotemporal representations under different actions. At the same time, we introduce position encoding and a global adaptive attention matrix, balancing both global and local information, and effectively modeling the spatiotemporal dependencies between joints. In the frequency domain, we use the discrete cosine transform (DCT) to achieve temporal-frequency conversion, discard part of the interference information, and use the frequency self-attention (FSA) and multi-level aggregation perceptron (MLAP) to deeply explore the frequency domain representation. The fusion of the temporal domain, spatial domain and frequency domain representations makes our model more discriminative in representing different actions. Besides, we verify the effectiveness of the model on the NTU RGB+D and PKU-MMD datasets. Extensive experiments show that our method outperforms existing unsupervised methods and achieves significant performance improvements in downstream tasks such as action recognition and action retrieval.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Multi-Domain Based Dynamic Graph Representation Learning for EEG Emotion Recognition
    Tang, Hao
    Xie, Songyun
    Xie, Xinzhou
    Cui, Yujie
    Li, Bohan
    Zheng, Dalu
    Hao, Yu
    Wang, Xiangming
    Jiang, Yiye
    Tian, Zhongyu
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5227 - 5238
  • [42] Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning
    Kim, Boeun
    Chang, Hyung Jin
    Kim, Jungho
    Choi, Jin Young
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 209 - 225
  • [43] Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning
    Kim, Boeun
    Chang, Hyung Jin
    Kim, Jungho
    Choi, Jin Young
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13664 LNCS : 209 - 225
  • [44] Multi-source Learning for Skeleton-based Action Recognition Using Deep LSTM Networks
    Cui, Ran
    Zhu, Aichun
    Zhang, Sai
    Hua, Gang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 547 - 552
  • [45] Multi-Dimensional Dynamic Topology Learning Graph Convolution for Skeleton-Based Action Recognition
    Luo H.-L.
    Cao L.-J.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (03): : 991 - 1001
  • [46] Multi-grained clip focus for skeleton-based action recognition
    Qiu, Helei
    Hou, Biao
    PATTERN RECOGNITION, 2024, 148
  • [47] Multi-Term Attention Networks for Skeleton-Based Action Recognition
    Diao, Xiaolei
    Li, Xiaoqiang
    Huang, Chen
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [48] Adaptive multi-level graph convolution with contrastive learning for skeleton-based action recognition
    Geng, Pei
    Li, Haowei
    Wang, Fuyun
    Lyu, Lei
    SIGNAL PROCESSING, 2022, 201
  • [49] AL-SAR: Active Learning for Skeleton-Based Action Recognition
    Li, Jingyuan
    Le, Trung
    Shlizerman, Eli
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 16966 - 16974
  • [50] Adaptive Feature Selection With Reinforcement Learning for Skeleton-Based Action Recognition
    Xu, Zheyuan
    Wang, Yingfu
    Jiang, Jiaqin
    Yao, Jian
    Li, Liang
    IEEE ACCESS, 2020, 8 : 213038 - 213051