PointMCD: Boosting Deep Point Cloud Encoders via Multi-View Cross-Modal Distillation for 3D Shape Recognition

被引:7
作者
Zhang, Qijian [1 ]
Hou, Junhui [1 ]
Qian, Yue [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Shape; Point cloud compression; Solid modeling; Visualization; Feature extraction; Task analysis; 3D point cloud; multi-view images; cross-modal; knowledge distillation; 3D shape recognition;
D O I
10.1109/TMM.2023.3286981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As two fundamental representation modalities of 3D objects, 3D point clouds and multi-view 2D images record shape information from different domains of geometric structures and visual appearances. In the current deep learning era, remarkable progress in processing such two data modalities has been achieved through respectively customizing compatible 3D and 2D network architectures. However, unlike multi-view image-based 2D visual modeling paradigms, which have shown leading performance in several common 3D shape recognition benchmarks, point cloud-based 3D geometric modeling paradigms are still highly limited by insufficient learning capacity due to the difficulty of extracting discriminative features from irregular geometric signals. In this article, we explore the possibility of boosting deep 3D point cloud encoders by transferring visual knowledge extracted from deep 2D image encoders under a standard teacher-student distillation workflow. Generally, we propose PointMCD, a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. To perform heterogeneous feature alignment between 2D visual and 3D geometric domains, we further investigate visibility-aware feature projection (VAFP), by which point-wise embeddings are reasonably aggregated into view-specific geometric descriptors. By pair-wisely aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification. Experiments on 3D shape classification, part segmentation, and unsupervised learning strongly validate the effectiveness of our method.
引用
收藏
页码:754 / 767
页数:14
相关论文
共 76 条
[1]  
Achlioptas P, 2018, PR MACH LEARN RES, V80
[2]   CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [J].
Afham, Mohamed ;
Dissanayake, Isuru ;
Dissanayake, Dinithi ;
Dharmasiri, Amaya ;
Thilakarathna, Kanchana ;
Rodrigo, Ranga .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9892-9902
[3]   3D Semantic Parsing of Large-Scale Indoor Spaces [J].
Armeni, Iro ;
Sener, Ozan ;
Zamir, Amir R. ;
Jiang, Helen ;
Brilakis, Ioannis ;
Fischer, Martin ;
Savarese, Silvio .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1534-1543
[4]   SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].
Behley, Jens ;
Garbade, Martin ;
Milioto, Andres ;
Quenzel, Jan ;
Behnke, Sven ;
Stachniss, Cyrill ;
Gall, Juergen .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306
[5]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[6]   Object detection using depth completion and camera-LiDAR fusion for autonomous driving [J].
Carranza-Garcia, Manuel ;
Javier Galan-Sales, F. ;
Maria Luna-Romera, Jose ;
Riquelme, Jose C. .
INTEGRATED COMPUTER-AIDED ENGINEERING, 2022, 29 (03) :241-258
[7]   Unsupervised Learning of Geometric Sampling Invariant Representations for 3D Point Clouds [J].
Chen, Haolan ;
Luo, Shitong ;
Gao, Xiang ;
Hu, Wei .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :893-903
[8]   Deep Unsupervised Learning of 3D Point Clouds via Graph Topology Inference and Filtering [J].
Chen, Siheng ;
Duan, Chaojing ;
Yang, Yaoqing ;
Li, Duanshun ;
Feng, Chen ;
Tian, Dong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3183-3198
[9]   Shape Self-Correction for Unsupervised Point Cloud Understanding [J].
Chen, Ye ;
Liu, Jinxian ;
Ni, Bingbing ;
Wang, Hang ;
Yang, Jiancheng ;
Liu, Ning ;
Li, Teng ;
Tian, Qi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :8362-8371
[10]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223