Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning

被引：4

作者：

Zhang, Jiahang ^{[1
]}

Lin, Lilang ^{[1
]}

Liu, Jiaying ^{[1
]}

机构：

[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Skeleton-based action recognition; contrastive learning; masked modeling; self-supervised learning;

D O I：

10.1145/3581783.3611774

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-supervised learning has proved effective for skeleton-based human action understanding, which is an important yet challenging topic. Previous works mainly rely on contrastive learning or masked motion modeling paradigm to model the skeleton relations. However, the sequence-level and joint-level representation learning cannot be effectively and simultaneously handled by these methods. As a result, the learned representations fail to generalize to different downstream tasks. Moreover, combining these two paradigms in a naive manner leaves the synergy between them untapped and can lead to interference in training. To address these problems, we propose Prompted Contrast with Masked Motion Modeling, PCM3, for versatile 3D action representation learning. Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner, which substantially boosts the generalization capacity for various downstream tasks. Specifically, masked prediction provides novel training views for contrastive learning, which in turn guides the masked prediction training with high-level semantic information. Moreover, we propose a dual-prompted multi-task pretraining strategy, which further improves model representations by reducing the interference caused by learning the two different pretext tasks. Extensive experiments on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM3 compared to the state-of-the-art works. Our project is publicly available at: https://jhang2020.github.io/Projects/PCM3/PCM3.html.

引用

页码：7175 / 7183

页数：9

共 53 条

[1] [Anonymous], 2019, IEEE CVPR
[2] Camara Fanta, 2020, IEEE T INTELL TRANSP, V22, P6131
[3] Chen T., 2020, arXiv, V119, P1597
[4] Ferroptosis: machinery and regulation
Chen, Xin
Li, Jingbo
Kang, Rui
Klionsky, Daniel J.
Tang, Daolin
[J]. AUTOPHAGY, 2021, 17 (09) : 2054 - 2081
[5] Transformers as Meta-learners for Implicit Neural Representations
Chen, Yinbo
Wang, Xiaolong
[J]. COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 : 170 - 187
[6] Chen Zhan, 2022, ARXIV220703065
[7] Cheng Ke, 2020, IEEE CVPR
[8] Cheng Yi-Bin, 2021, IEEE ICME
[9] Dong Jianfeng, 2023, AAAI
[10] Du Yong, 2015, IEEE CVPR

← 1 2 3 4 5 6 →