Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation

被引:93
作者
Hou, Yuenan [1 ]
Zhu, Xinge [2 ]
Ma, Yuexin [3 ]
Loy, Chen Change [4 ]
Li, Yikang [1 ]
机构
[1] Shanghai AI Lab, Shanghai, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[3] ShanghaiTech Univ, Shanghai, Peoples R China
[4] Nanyang Technol Univ, S Lab, Singapore, Singapore
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.00829
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article addresses the problem of distilling knowledge from a large teacher model to a slim student network for LiDAR semantic segmentation. Directly employing previous distillation approaches yields inferior results due to the intrinsic challenges of point cloud, i.e., sparsity, randomness and varying density. To tackle the aforementioned problems, we propose the Point-to-Voxel Knowledge Distillation(PVD), which transfers the hidden knowledge from both point level and voxel level. Specifically, we first leverage both the pointwise and voxelwise output distillation to complement the sparse supervision signals. Then, to better exploit the structural information, we divide the whole point cloud into several supervoxels and design a difficulty-aware sampling strategy to more frequently sample supervoxels containing less-frequent classes and faraway objects. On these supervoxels, we propose inter-point and inter-voxel affinity distillation, where the similarity information between points and voxels can help the student model better capture the structural information of the surrounding environment.We conduct extensive experiments on two popular LiDAR segmentation benchmarks, i.e., nuScenes [3] and SemanticKITTI [1]. On both benchmarks, our PVD-consistently outperforms previous distillation approaches by a large margin on three representative backbones, i.e.,Cylinder3D [36, 37], SPVNAS [25] and MinkowskiNet [5]. Notably, on the challenging nuScenes and SemanticKITTI datasets, our method can achieve roughly 75% MACs reduction and 2x speedup on the competitive Cylinder3D model and rank 1st on the SemanticKITTI leaderboard among all published algorithms(1). Our code is available athttps:// github.com/cardwing/Codes-for-PVKD.
引用
收藏
页码:8469 / 8478
页数:10
相关论文
共 37 条
  • [1] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
    Behley, Jens
    Garbade, Martin
    Milioto, Andres
    Quenzel, Jan
    Behnke, Sven
    Stachniss, Cyrill
    Gall, Juergen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
  • [2] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
    Berman, Maxim
    Triki, Amal Rannen
    Blaschko, Matthew B.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4413 - 4421
  • [3] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [4] Chenfeng Xu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12373), P1, DOI 10.1007/978-3-030-58604-1_1
  • [5] (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network
    Cheng, Ran
    Razani, Ryan
    Taghavi, Ehsan
    Li, Enxu
    Liu, Bingbing
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12542 - 12551
  • [6] 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
    Choy, Christopher
    Gwak, JunYoung
    Savarese, Silvio
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3070 - 3079
  • [7] Cortinhal Tiago, 2020, Advances in Visual Computing. 15th International Symposium, ISVC 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12510), P207, DOI 10.1007/978-3-030-64559-5_16
  • [8] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [9] Feihu Zhang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12369), P644, DOI 10.1007/978-3-030-58586-0_38
  • [10] TORNADO-Net: mulTiview tOtal vaRiatioN semAntic segmentation with Diamond inceptiOn module
    Gerdzhev, Martin
    Razani, Ryan
    Taghavi, Ehsan
    Liu Bingbing
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 9543 - 9549