Boosting Cross-Domain Point Classification via Distilling Relational Priors From 2D Transformers

被引:0
作者
Zou, Longkun [1 ,2 ]
Zhu, Wanru [1 ]
Chen, Ke [2 ]
Guo, Lihua [1 ]
Guo, Kailing [1 ]
Jia, Kui [3 ]
Wang, Yaowei [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China
[2] Pengcheng Lab, Shenzhen 518000, Peoples R China
[3] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen CUHK Shenzhen, Shenzhen 518000, Peoples R China
关键词
Point cloud compression; Three-dimensional displays; Transformers; Solid modeling; Training; Task analysis; Shape; Unsupervised domain adaptation; point clouds; relational priors; cross-modal; knowledge distillation;
D O I
10.1109/TCSVT.2024.3440517
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries, which greatly limits their cross-domain generalization. Recently, the transformer-based models have achieved impressive performance gain in a range of image-based tasks, benefiting from its strong generalization capability and scalability stemming from capturing long range correlation across local patches. Inspired by such successes of visual transformers, we propose a novel Relational Priors Distillation (RPD) method to extract relational priors from the well-trained transformers on massive images, which can significantly empower cross-domain representations with consistent topological priors of objects. To this end, we establish a parameter-frozen pre-trained transformer module shared between 2D teacher and 3D student models, complemented by an online knowledge distillation strategy for semantically regularizing the 3D student model. Furthermore, we introduce a novel self-supervised task centered on reconstructing masked point cloud patches using corresponding masked multi-view image features, thereby empowering the model with incorporating 3D geometric information. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification. The source code of this work is available at https://github.com/zou-longkun/RPD.git.
引用
收藏
页码:12963 / 12976
页数:14
相关论文
共 83 条
[1]   Self-Supervised Learning for Domain Adaptation on Point Clouds [J].
Achituve, Idan ;
Maron, Haggai ;
Chechik, Gal .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, :123-133
[2]  
Ba J, 2014, ACS SYM SER
[3]   Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [J].
Bousmalis, Konstantinos ;
Silberman, Nathan ;
Dohan, David ;
Erhan, Dumitru ;
Krishnan, Dilip .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :95-104
[4]  
Bucilua C., 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464
[5]   Self-Distillation for Unsupervised 3D Domain Adaptation [J].
Cardace, Adriano ;
Spezialetti, Riccardo ;
Ramirez, Pierluigi Zama ;
Salti, Samuele ;
Di Stefano, Luigi .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :4155-4166
[6]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[7]  
Chen L, 2022, AAAI CONF ARTIF INTE, P6248
[8]   Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap [J].
Chen, Yongwei ;
Wang, Zihao ;
Zou, Longkun ;
Chen, Ke ;
Jia, Kui .
COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 :728-745
[9]   4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].
Choy, Christopher ;
Gwak, JunYoung ;
Savarese, Silvio .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079
[10]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443