A generically Contrastive Spatiotemporal Representation Enhancement for 3D skeleton action recognition

被引:0
作者
Zhang, Shaojie [1 ]
Yin, Jianqin [1 ]
Dang, Yonghao [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Human-robot interaction; Action recognition; Skeleton; Contrastive learning; NETWORKS;
D O I
10.1016/j.patcog.2025.111521
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition is a central task in computer vision and human-robot interaction. However, most previous methods suffer from overlooking the explicit exploitation of the latent data distributions (i.e., the intra-class variations and inter-class relations), thereby leading to confusion about ambiguous samples and sub-optimum solutions of the skeleton encoders. To mitigate this, we propose a Contrastive Spatiotemporal Representation Enhancement (CSRE) framework to obtain more discriminative representations from the sequences, which can be incorporated into various previous skeleton encoders and can be removed when testing. Specifically, we decompose the representation into spatial- and temporal-specific features to explore fine-grained motion patterns along the corresponding dimensions. Furthermore, to explicitly exploit the latent data distributions, we employ the attentive features to contrastive learning, which models the cross-sequence semantic relations by pulling together the features from the positive pairs and pushing away the negative pairs. Extensive experiments show that CSRE with five various skeleton encoders (HCN, 2S-AGCN, CTR-GCN, Hyperformer, and BlockGCN) achieves solid improvements on five benchmarks. The code will be released at https://github.com/zhshj0110/CSRE.
引用
收藏
页数:9
相关论文
共 42 条
[1]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[2]   Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition [J].
Chen, Tailin ;
Zhou, Desen ;
Wang, Jian ;
Wang, Shidong ;
Guan, Yu ;
He, Xuming ;
Ding, Errui .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :4334-4342
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]   Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition [J].
Chen, Yuxin ;
Zhang, Ziqi ;
Yuan, Chunfeng ;
Li, Bing ;
Deng, Ying ;
Hu, Weiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13339-13348
[5]  
Cheng K, 2020, Img Proc Comp Vis Re, V12369, P536, DOI 10.1007/978-3-030-58586-0_32
[6]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[7]   InfoGCN: Representation Learning for Human Skeleton-based Action Recognition [J].
Chi, Hyung-gun ;
Ha, Myoung Hoon ;
Chi, Seunggeun ;
Lee, Sang Wan ;
Huang, Qixing ;
Ramani, Karthik .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20154-20164
[8]   Deep Learning Based 2D Human Pose Estimation: A Survey [J].
Dang, Qi ;
Yin, Jianqin ;
Wang, Bin ;
Zheng, Wenqing .
TSINGHUA SCIENCE AND TECHNOLOGY, 2019, 24 (06) :663-676
[9]   DWnet: Deep-wide network for 3D action recognition [J].
Dang, Yonghao ;
Yang, Fuxing ;
Yin, Jianqin .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2020, 126
[10]  
Dong JF, 2023, AAAI CONF ARTIF INTE, P525