OTM-HC: Enhanced Skeleton-Based Action Representation via One-to-Many Hierarchical Contrastive Learning

被引：0

作者：

Usman, Muhammad ^{[1
,2
,3
]}

Cao, Wenming ^{[1
,2
,3
]}

Huang, Zhao ^{[4
]}

Zhong, Jianqi ^{[1
,2
,3
]}

Ji, Ruiya ^{[5
]}

机构：

[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China

[2] Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China

[3] Shenzhen Univ, Shenzhen 518060, Peoples R China

[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle NE1 8ST, England

[5] Queen Mary Univ London, Dept Comp Sci, London E1 4NS, England

来源：

AI | 2024年 / 5卷 / 04期

基金：

中国国家自然科学基金;

关键词：

skeleton-based action representation learning; unsupervised learning; hierarchical contrastive learning; one-to-many; GRAPH CONVOLUTIONAL NETWORKS; LSTM;

D O I：

10.3390/ai5040106

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action recognition has become crucial in computer vision, with growing applications in surveillance, human-computer interaction, and healthcare. Traditional approaches often use broad feature representations, which may miss subtle variations in timing and movement within action sequences. Our proposed One-to-Many Hierarchical Contrastive Learning (OTM-HC) framework maps the input into multi-layered feature vectors, creating a hierarchical contrast representation that captures various granularities within a human skeleton sequence temporal and spatial domains. Using sequence-to-sequence (Seq2Seq) transformer encoders and downsampling modules, OTM-HC can distinguish between multiple levels of action representations, such as instance, domain, clip, and part levels. Each level contributes significantly to a comprehensive understanding of action representations. The OTM-HC model design is adaptable, ensuring smooth integration with advanced Seq2Seq encoders. We tested the OTM-HC framework across four datasets, demonstrating improved performance over state-of-the-art models. Specifically, OTM-HC achieved improvements of 0.9% and 0.6% on NTU60, 0.4% and 0.7% on NTU120, and 0.7% and 0.3% on PKU-MMD I and II, respectively, surpassing previous leading approaches across these datasets. These results showcase the robustness and adaptability of our model for various skeleton-based action recognition tasks.

引用

页码：2170 / 2186

页数：17

共 10 条

[1] EnsCLR: Unsupervised skeleton-based action recognition via ensemble contrastive learning of representation
Wang, Kun
Cao, Jiuxin
Cao, Biwei
Liu, Bo
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
[2] Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations
Zhang, Jiahang
Lin, Lilang
Liu, Jiaying
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3427 - 3435
[3] Unsupervised skeleton-based action representation learning via relation consistency pursuit
Zhang, Wenjing
Hou, Yonghong
Zhang, Haoyuan
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 20327 - 20339
[4] Unsupervised skeleton-based action representation learning via relation consistency pursuit
Wenjing Zhang
Yonghong Hou
Haoyuan Zhang
Neural Computing and Applications, 2022, 34 : 20327 - 20339
[5] Reconstruction-driven contrastive learning for unsupervised skeleton-based human action recognition
Liu, Xing
Gao, Bo
JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
[6] Learning Representations by Contrastive Spatio-Temporal Clustering for Skeleton-Based Action Recognition
Wang, Mingdao
Li, Xueming
Chen, Siqi
Zhang, Xianlin
Ma, Lei
Zhang, Yue
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3207 - 3220
[7] Efficient Spatio-Temporal Contrastive Learning for Skeleton-Based 3-D Action Recognition
Gao, Xuehao
Yang, Yang
Zhang, Yimeng
Li, Maosen
Yu, Jin-Gang
Du, Shaoyi
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 405 - 417
[8] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition
Wang, Peng
Wen, Jun
Si, Chenyang
Qian, Yuntao
Wang, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6224 - 6238
[9] Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition
He, Zhiquan
Lv, Jiantu
Fang, Shizhang
NEUROCOMPUTING, 2024, 582
[10] Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition
Yang, Huaigang
Zhang, Qieshi
Ren, Ziliang
Yuan, Huaqiang
Zhang, Fuyong
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2024, 14

← 1 →