Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding

被引：25

作者：

Wu, Yue ^{[1
]}

Liu, Jiaming ^{[1
]}

Gong, Maoguo ^{[2
]}

Gong, Peiran ^{[1
]}

Fan, Xiaolong ^{[2
]}

Qin, A. K. ^{[3
]}

Miao, Qiguang ^{[1
]}

Ma, Wenping ^{[4
]}

机构：

[1] Xidian Univ, Sch Comp Sci & Technol, Key Lab Collaborat Intelligence Syst, Minist Educ, Xian 710071, Peoples R China

[2] Xidian Univ, Sch Elect Engn, Key Lab Collaborat Intelligence Syst, Minist Educ, Xian 710071, Peoples R China

[3] Swinburne Univ Technol, Dept Comp Sci & Software Engn, Hawthorn, Vic 3122, Australia

[4] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Point cloud compression; Three-dimensional displays; Task analysis; Feature extraction; Self-supervised learning; Image color analysis; Visualization; Self-supervision; cross-modal learning; joint; 3D-2D; point cloud understanding; NETWORK;

D O I：

10.1109/TMM.2023.3284591

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning effective representations from unlabeled data is a challenging task for point cloud understanding. As the human visual system can map concepts learned from 2D images to the 3D world, and inspired by recent multimodal research, we introduce data from point cloud modality and image modality for joint learning. Based on the properties of point clouds and images, we propose CrossNet, a comprehensive intra- and cross-modal contrastive learning method that learns 3D point cloud representations. The proposed method achieves 3D-3D and 3D-2D correspondences of objectives by maximizing the consistency of point clouds and their augmented versions, and with the corresponding rendered images in invariant space. We further distinguish the rendered images into RGB and grayscale images to extract color and geometric features, respectively. These training objectives combine feature correspondences between modalities to combine rich learning signals from point clouds and images. Our CrossNet is simple: we add a feature extraction module and a projection head module to the point cloud and image branches, respectively, to train the backbone network in a self-supervised manner. After the network is pretrained, only the point cloud feature extraction module is required for fine-tuning and directly predicting results for downstream tasks. Our experiments on multiple benchmarks demonstrate improved point cloud classification and segmentation results, and the learned representations can be generalized across domains.

引用

页码：1626 / 1638

页数：13

共 50 条

[1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Afham, Mohamed
Dissanayake, Isuru
Dissanayake, Dinithi
Dharmasiri, Amaya
Thilakarathna, Kanchana
Rodrigo, Ranga
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
[2] SCPNet: Unsupervised Cross-Modal Homography Estimation via Intra-modal Self-supervised Learning
Zhang, Runmin
Ma, Jun
Cao, Si-Yuan
Luo, Lun
Yu, Beinan
Chen, Shu-Jie
Li, Junwei
Shen, Hui-Liang
COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 460 - 477
[3] Self-Supervised Correlation Learning for Cross-Modal Retrieval
Liu, Yaxin
Wu, Jianlong
Qu, Leigang
Gan, Tian
Yin, Jianhua
Nie, Liqiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
[4] Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search
Liang, Meiyu
Du, Junping
Liang, Zhengyang
Xing, Yongwang
Huang, Wei
Xue, Zhe
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13744 - 13753
[5] RELATIONS BETWEEN INTRA-MODAL AND CROSS-MODAL MATCHING
BJORKMAN, M
SCANDINAVIAN JOURNAL OF PSYCHOLOGY, 1967, 8 (02) : 65 - &
[6] Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training
Sheng, Changchong
Pietikainen, Matti
Tian, Qi
Liu, Li
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2456 - 2464
[7] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
[8] COMPARISON OF CROSS-MODAL AND INTRA-MODAL FORM RECOGNITION IN CHILDREN WITH LEARNING DISABILITIES
GAINES, BJ
RASKIN, LM
JOURNAL OF LEARNING DISABILITIES, 1970, 3 (05) : 243 - 246
[9] INTRA-MODAL AND CROSS-MODAL MATCHING IN ALZHEIMERS-DISEASE
BUTTERS, MA
RAPCSAK, SZ
KASZNIAK, AW
BONDI, MW
JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 1990, 12 (01) : 21 - 21
[10] Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Das, Srijan
Ryoo, Michael
2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,

← 1 2 3 4 5 →