Progressive semantic learning for unsupervised skeleton-based action recognition

被引:0
作者
Qin, Hao [1 ]
Chen, Luyuan [2 ]
Kong, Ming [1 ]
Zhao, Zhuoran [1 ]
Zeng, Xianzhou [1 ]
Lu, Mengxu [1 ]
Zhu, Qiang [1 ]
机构
[1] Zhejiang Univ, Sch Comp Sci & Technol, Hangzhou 310013, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Comp Sch, Beijing 100101, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Unsupervised learning; Semantic learning; Progressive optimization;
D O I
10.1007/s10994-024-06667-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional contrastive learning frameworks for skeleton-based action recognition use data augmentation and memory bank techniques to obtain positive/negative samples required for training, but this instance-level pseudo-label generation mechanism does not take full advantage of the rich cluster-level semantic information contained in human skeleton sequences. In this paper, we propose a Progressive Semantic Learning method (ProSL), which gradually optimizes the pseudo-label generation mechanism in self-supervised contrastive learning through an iterative framework, so that representation learning can effectively capture action semantic information. Specifically, the existing contrastive learning methods can output an initial skeleton encoder. Then, on the basis of this encoder, clustering methods can be applied to generate a Codebook containing the semantic information of human actions, which is further used to improve the pseudo-label generation mechanism. Finally, based on the above two-step iterations, we achieve progressive semantic learning and obtain a more reasonable skeleton encoder. Extensive experiments on four datasets demonstrate that our proposed method achieves SOTA on multiple downstream tasks.
引用
收藏
页数:20
相关论文
共 42 条
[1]  
Bardes A, 2022, Arxiv, DOI [arXiv:2105.04906, DOI 10.48550/ARXIV.2105.04906]
[2]  
Caron M, 2020, ADV NEUR IN, V33
[3]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[4]  
Chen T., 2020, INT C MACH LEARN PML, P1597
[5]   Hierarchically Self-supervised Transformer for Human Skeleton Representation Learning [J].
Chen, Yuxiao ;
Zhao, Long ;
Yuan, Jianbo ;
Tian, Yu ;
Xia, Zhaoyang ;
Geng, Shijie ;
Han, Ligong ;
Metaxas, Dimitris N. .
COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 :185-202
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
Dong JF, 2023, AAAI CONF ARTIF INTE, P525
[8]  
Grill J., 2020, Proc. Adv. Neural Inf. Process. Syst.
[9]  
Guo TY, 2022, AAAI CONF ARTIF INTE, P762
[10]   HCSC: Hierarchical Contrastive Selective Coding [J].
Guo, Yuanfan ;
Xu, Minghao ;
Li, Jiawen ;
Ni, Bingbing ;
Zhu, Xuanyu ;
Sun, Zhenbang ;
Xu, Yi .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9696-9705