A Contrastive Learning Method for the Visual Representation of 3D Point Clouds

被引:2
作者
Zhu, Feng [1 ]
Zhao, Jieyu [1 ,2 ]
Cai, Zhengyi [1 ]
机构
[1] Ningbo Univ, Fac Elect Engn & Comp Sci, Ningbo 315211, Peoples R China
[2] Mobile Network Applicat Technol Key Lab Zhejiang, Ningbo 315211, Peoples R China
基金
中国国家自然科学基金;
关键词
unsupervised learning; contrastive learning; capsule network;
D O I
10.3390/a15030089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, the unsupervised visual representation learning of the point cloud model is mainly based on generative methods, but the generative methods pay too much attention to the details of each point, thus ignoring the learning of semantic information. Therefore, this paper proposes a discriminative method for the contrastive learning of three-dimensional point cloud visual representations, which can effectively learn the visual representation of point cloud models. The self-attention point cloud capsule network is designed as the backbone network, which can effectively extract the features of point cloud data. By compressing the digital capsule layer, the class dependence of features is eliminated, and the generalization ability of the model and the ability of feature queues to store features are improved. Aiming at the equivariance of the capsule network, the Jaccard loss function is constructed, which is conducive to the network distinguishing the characteristics of positive and negative samples, thereby improving the performance of the contrastive learning. The model is pre-trained on the ShapeNetCore data set, and the pre-trained model is used for classification and segmentation tasks. The classification accuracy on the ModelNet40 data set is 0.1% higher than that of the best unsupervised method, PointCapsNet, and when only 10% of the label data is used, the classification accuracy exceeds 80%. The mIoU of part segmentation on the ShapeNet data set is 1.2% higher than the best comparison method, MulUnsupervised. The experimental results of classification and segmentation show that the proposed method has good performance in accuracy. The alignment and uniformity of features are better than the generative method of PointCapsNet, which proves that this method can learn the visual representation of the three-dimensional point cloud model more effectively.
引用
收藏
页数:17
相关论文
共 27 条
[1]  
Achlioptas P, 2018, PR MACH LEARN RES, V80
[2]  
Chen T, 2020, PR MACH LEARN RES, V119
[3]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[4]   Multi-task Self-Supervised Visual Learning [J].
Doersch, Carl ;
Zisserman, Andrew .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2070-2079
[5]   Unsupervised Visual Representation Learning by Context Prediction [J].
Doersch, Carl ;
Gupta, Abhinav ;
Efros, Alexei A. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1422-1430
[6]   Hand PointNet: 3D Hand Pose Estimation using Point Sets [J].
Ge, Liuhao ;
Cai, Yujun ;
Weng, Junwu ;
Yuan, Junsong .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8417-8426
[7]   Capsule Network is Not More Robust than Convolutional Network [J].
Gu, Jindong ;
Tresp, Volker ;
Hu, Han .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14304-14312
[8]   Unsupervised Multi-Task Feature Learning on Point Clouds [J].
Hassani, Kaveh ;
Haley, Mike .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8159-8170
[9]  
He K, 2016, C COMPUTER VISION PA, P770
[10]   Momentum Contrast for Unsupervised Visual Representation Learning [J].
He, Kaiming ;
Fan, Haoqi ;
Wu, Yuxin ;
Xie, Saining ;
Girshick, Ross .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735