SnapshotNet: Self-supervised feature learning for point cloud data segmentation using minimal labeled data

被引:7
作者
Li, Xingye [1 ]
Zhang, Ling [1 ]
Zhu, Zhigang [1 ,2 ]
机构
[1] CUNY City Coll, 160 Convent Ave, New York, NY 10031 USA
[2] CUNY, Grad Ctr, 365 5th Ave, New York, NY 10016 USA
基金
美国国家科学基金会;
关键词
Self-supervision; Point cloud; Semantic segmentation;
D O I
10.1016/j.cviu.2021.103339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Manually annotating complex scene point cloud datasets is both costly and error-prone. To reduce the reliance on labeled data, a new model called SnapshotNet is proposed as a self-supervised feature learning approach, which directly works on the unlabeled point cloud data of a complex 3D scene. The SnapshotNet pipeline includes three stages. In the snapshot capturing stage, snapshots, which are defined as local collections of points, are sampled from the point cloud scene. A snapshot could be a view of a local 3D scan directly captured from the real scene, or a virtual view of such from a large 3D point cloud dataset. Snapshots could also be sampled at different sampling rates or fields of view (FOVs), thus multi-FOV snapshots, to capture scale information from the scene. In the feature learning stage, a new pre-text task called multi-FOV contrasting is proposed to recognize whether two snapshots are from the same object or not, within the same FOV or across different FOVs. Snapshots go through two self-supervised learning steps: the contrastive learning step with both part contrasting and scale contrasting, followed by a snapshot clustering step to extract higher level semantic features. Then a weakly-supervised segmentation stage is implemented by first training a standard SVM classifier on the learned features with a small fraction of labeled snapshots. Then trained SVM is further used to predict labels for input snapshots and predicted labels are converted into point-wise label assignments for semantic segmentation of the entire scene using a voting procedure. The experiments are conducted on the Semantic3D dataset and the results have shown that the proposed method is capable of learning effective features from snapshots of complex scene data without any labels. Moreover, the proposed weakly-supervised method has shown advantages when comparing to the state of the art method on weakly-supervised point cloud semantic segmentation.
引用
收藏
页数:14
相关论文
共 35 条
[1]  
Achlioptas P, 2018, PR MACH LEARN RES, V80
[2]   3D-MiniNet: Learning a 2D Representation From Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation [J].
Alonso, Inigo ;
Riazuelo, Luis ;
Montesano, Luis ;
Murillo, Ana C. .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) :5432-5439
[3]  
[Anonymous], 2017, 3DOR
[4]  
Apple Inc, 2020, IP PRO
[5]   3D Semantic Parsing of Large-Scale Indoor Spaces [J].
Armeni, Iro ;
Sener, Ozan ;
Zamir, Amir R. ;
Jiang, Helen ;
Brilakis, Ioannis ;
Fischer, Martin ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1534-1543
[6]  
Bhargava P, 2019, SYMP VLSI CIRCUITS, pC262, DOI [10.23919/vlsic.2019.8778154, 10.23919/VLSIC.2019.8778154]
[7]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156
[8]  
Chang Angel X, 2015, Technical Report
[9]  
Chen N., 2020, CVPR, P9121
[10]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554