Trusted 3D self-supervised representation learning with cross-modal settings

被引：0

作者：

Han, Xu ^{[1
]}

Cheng, Haozhe ^{[1
]}

Shi, Pengcheng ^{[1
]}

Zhu, Jihua ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Software, Xian 710049, Shanxi, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2024年 / 35卷 / 04期

关键词：

Point clouds; Cross-modal learning; Self-supervised Representation learning; Contrastive learning; Uncertainty;

D O I：

10.1007/s00138-024-01556-w

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal setting employing 2D images and 3D point clouds in self-supervised representation learning is proven to be an effective way to enhance visual perception capabilities. However, different modalities have different data formats and representations. Directly using features extracted from cross-modal datasets may lead to information conflicting and collapsing. We refer to this problem as uncertainty in network learning. Therefore, reducing uncertainty to obtain trusted descriptions has become the key to improving network performance. Motivated by this, we propose our trusted cross-modal network in self-supervised learning (TCMSS). It can obtain trusted descriptions by a trusted combination module as well as improve network performance with a well-designed loss function. In the trusted combination module, we utilize the Dirichlet distribution and the subjective logic to parameterize the features and acquire probabilistic uncertainty at the same. Then, the Dempster-Shafer Theory (DST) is used to obtain trusted descriptions by weighting uncertainty to the parameterized results. We have also designed our trusted domain loss function, including domain loss and trusted loss. It can effectively improve the prediction accuracy of the network by applying contrastive learning between different feature descriptions. The experimental results show that our model outperforms previous results on linear classification in ScanObjectNN as well as few-shot classification in both ModelNet40 and ScanObjectNN. In addition, part segmentation also reports a superior result to previous methods in ShapeNet. Further, the ablation studies validate the potency of our method for a better point cloud understanding.

引用

页数：14

共 58 条

[1] Achlioptas P, 2018, PR MACH LEARN RES, V80
[2] Afham M., 2021, arXiv
[3] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Afham, Mohamed
Dissanayake, Isuru
Dissanayake, Dinithi
Dharmasiri, Amaya
Thilakarathna, Kanchana
Rodrigo, Ranga
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
[4] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
[5] EDGCNet: Joint dynamic hyperbolic graph convolution and dual squeeze-and-attention for 3D point cloud segmentation
Cheng, Haozhe
Zhu, Jihua
Lu, Jian
Han, Xu
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
[6] PTANet: Triple Attention Network for point cloud semantic segmentation
Cheng, Haozhe
Lu, Jian
Luo, Maoxin
Liu, Wei
Zhang, Kaibing
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[7] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
Dai, Angela
Qi, Charles Ruizhongtai
Niessner, Matthias
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
[8] DEMPSTER AP, 1968, J ROY STAT SOC B, V30, P205
[9] Dong R., 2023, 11 INT C LEARN REPR
[10] Self-Contrastive Learning with Hard Negative Sampling for Self-supervised Point Cloud Learning
Du, Bi'an
Gao, Xiang
Hu, Wei
Li, Xin
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3133 - 3142

← 1 2 3 4 5 6 →