OTM-HC: Enhanced Skeleton-Based Action Representation via One-to-Many Hierarchical Contrastive Learning

被引:0
|
作者
Usman, Muhammad [1 ,2 ,3 ]
Cao, Wenming [1 ,2 ,3 ]
Huang, Zhao [4 ]
Zhong, Jianqi [1 ,2 ,3 ]
Ji, Ruiya [5 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[2] Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
[3] Shenzhen Univ, Shenzhen 518060, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle NE1 8ST, England
[5] Queen Mary Univ London, Dept Comp Sci, London E1 4NS, England
基金
中国国家自然科学基金;
关键词
skeleton-based action representation learning; unsupervised learning; hierarchical contrastive learning; one-to-many; GRAPH CONVOLUTIONAL NETWORKS; LSTM;
D O I
10.3390/ai5040106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition has become crucial in computer vision, with growing applications in surveillance, human-computer interaction, and healthcare. Traditional approaches often use broad feature representations, which may miss subtle variations in timing and movement within action sequences. Our proposed One-to-Many Hierarchical Contrastive Learning (OTM-HC) framework maps the input into multi-layered feature vectors, creating a hierarchical contrast representation that captures various granularities within a human skeleton sequence temporal and spatial domains. Using sequence-to-sequence (Seq2Seq) transformer encoders and downsampling modules, OTM-HC can distinguish between multiple levels of action representations, such as instance, domain, clip, and part levels. Each level contributes significantly to a comprehensive understanding of action representations. The OTM-HC model design is adaptable, ensuring smooth integration with advanced Seq2Seq encoders. We tested the OTM-HC framework across four datasets, demonstrating improved performance over state-of-the-art models. Specifically, OTM-HC achieved improvements of 0.9% and 0.6% on NTU60, 0.4% and 0.7% on NTU120, and 0.7% and 0.3% on PKU-MMD I and II, respectively, surpassing previous leading approaches across these datasets. These results showcase the robustness and adaptability of our model for various skeleton-based action recognition tasks.
引用
收藏
页码:2170 / 2186
页数:17
相关论文
共 10 条
  • [1] EnsCLR: Unsupervised skeleton-based action recognition via ensemble contrastive learning of representation
    Wang, Kun
    Cao, Jiuxin
    Cao, Biwei
    Liu, Bo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
  • [2] Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations
    Zhang, Jiahang
    Lin, Lilang
    Liu, Jiaying
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3427 - 3435
  • [3] Unsupervised skeleton-based action representation learning via relation consistency pursuit
    Zhang, Wenjing
    Hou, Yonghong
    Zhang, Haoyuan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 20327 - 20339
  • [4] Unsupervised skeleton-based action representation learning via relation consistency pursuit
    Wenjing Zhang
    Yonghong Hou
    Haoyuan Zhang
    Neural Computing and Applications, 2022, 34 : 20327 - 20339
  • [5] Reconstruction-driven contrastive learning for unsupervised skeleton-based human action recognition
    Liu, Xing
    Gao, Bo
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [6] Learning Representations by Contrastive Spatio-Temporal Clustering for Skeleton-Based Action Recognition
    Wang, Mingdao
    Li, Xueming
    Chen, Siqi
    Zhang, Xianlin
    Ma, Lei
    Zhang, Yue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3207 - 3220
  • [7] Efficient Spatio-Temporal Contrastive Learning for Skeleton-Based 3-D Action Recognition
    Gao, Xuehao
    Yang, Yang
    Zhang, Yimeng
    Li, Maosen
    Yu, Jin-Gang
    Du, Shaoyi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 405 - 417
  • [8] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition
    Wang, Peng
    Wen, Jun
    Si, Chenyang
    Qian, Yuntao
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6224 - 6238
  • [9] Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition
    He, Zhiquan
    Lv, Jiantu
    Fang, Shizhang
    NEUROCOMPUTING, 2024, 582
  • [10] Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition
    Yang, Huaigang
    Zhang, Qieshi
    Ren, Ziliang
    Yuan, Huaqiang
    Zhang, Fuyong
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2024, 14