Trusted 3D self-supervised representation learning with cross-modal settings

被引:0
作者
Han, Xu [1 ]
Cheng, Haozhe [1 ]
Shi, Pengcheng [1 ]
Zhu, Jihua [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software, Xian 710049, Shanxi, Peoples R China
关键词
Point clouds; Cross-modal learning; Self-supervised Representation learning; Contrastive learning; Uncertainty;
D O I
10.1007/s00138-024-01556-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal setting employing 2D images and 3D point clouds in self-supervised representation learning is proven to be an effective way to enhance visual perception capabilities. However, different modalities have different data formats and representations. Directly using features extracted from cross-modal datasets may lead to information conflicting and collapsing. We refer to this problem as uncertainty in network learning. Therefore, reducing uncertainty to obtain trusted descriptions has become the key to improving network performance. Motivated by this, we propose our trusted cross-modal network in self-supervised learning (TCMSS). It can obtain trusted descriptions by a trusted combination module as well as improve network performance with a well-designed loss function. In the trusted combination module, we utilize the Dirichlet distribution and the subjective logic to parameterize the features and acquire probabilistic uncertainty at the same. Then, the Dempster-Shafer Theory (DST) is used to obtain trusted descriptions by weighting uncertainty to the parameterized results. We have also designed our trusted domain loss function, including domain loss and trusted loss. It can effectively improve the prediction accuracy of the network by applying contrastive learning between different feature descriptions. The experimental results show that our model outperforms previous results on linear classification in ScanObjectNN as well as few-shot classification in both ModelNet40 and ScanObjectNN. In addition, part segmentation also reports a superior result to previous methods in ShapeNet. Further, the ablation studies validate the potency of our method for a better point cloud understanding.
引用
收藏
页数:14
相关论文
共 58 条
  • [1] Achlioptas P, 2018, PR MACH LEARN RES, V80
  • [2] Afham M., 2021, arXiv
  • [3] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [4] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
  • [5] EDGCNet: Joint dynamic hyperbolic graph convolution and dual squeeze-and-attention for 3D point cloud segmentation
    Cheng, Haozhe
    Zhu, Jihua
    Lu, Jian
    Han, Xu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [6] PTANet: Triple Attention Network for point cloud semantic segmentation
    Cheng, Haozhe
    Lu, Jian
    Luo, Maoxin
    Liu, Wei
    Zhang, Kaibing
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [7] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [8] DEMPSTER AP, 1968, J ROY STAT SOC B, V30, P205
  • [9] Dong R., 2023, 11 INT C LEARN REPR
  • [10] Self-Contrastive Learning with Hard Negative Sampling for Self-supervised Point Cloud Learning
    Du, Bi'an
    Gao, Xiang
    Hu, Wei
    Li, Xin
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3133 - 3142