Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation

被引：11

作者：

Li, Miaoyu ^{[1
]}

Zhang, Yachao ^{[1
]}

Xie, Yuan ^{[2
]}

Gao, Zuodong ^{[1
]}

Li, Cuihua ^{[1
]}

Zhang, Zhizhong ^{[2
]}

Qu, Yanyun ^{[1
]}

机构：

[1] Xiamen Univ, Xiamen, Peoples R China

[2] East China Normal Univ, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

上海市自然科学基金; 中国国家自然科学基金;

关键词：

Unsupervised domain adaptation; Target-aware teacher network; Cross-domain knowledge distillation; 3D Semantic segmentation; Multi-modal style transfer;

D O I：

10.1145/3503161.3547990

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

With the emergence of multi-modal datasets where LiDAR and camera are synchronized and calibrated, cross-modal Unsupervised Domain Adaptation (UDA) has attracted increasing attention because it reduces the laborious annotation of target domain samples. To alleviate the distribution gap between source and target domains, existing methods conduct feature alignment by using adversarial learning. However, it is well-known to be highly sensitive to hyperparameters and difficult to train. In this paper, we propose a novel model (Dual-Cross) that integrates Cross-Domain Knowledge Distillation (CDKD) and Cross-Modal Knowledge Distillation (CMKD) to mitigate domain shift. Specifically, we design the multi-modal style transfer to convert source image and point cloud to target style. With these synthetic samples as input, we introduce a target-aware teacher network to learn knowledge of the target domain. Then we present dual-cross knowledge distillation when the student is learning on source domain. CDKD constrains teacher and student predictions under same modality to be consistent. It can transfer target-aware knowledge from the teacher to the student, making the student more adaptive to the target domain. CMKD generates hybrid-modal prediction from the teacher predictions and constrains it to be consistent with both 2D and 3D student predictions. It promotes the information interaction between two modalities to make them complement each other. From the evaluation results on various domain adaptation settings, Dual-Cross significantly outperforms both uni-modal and cross-modal state-of-the-art methods.

引用

页码：3829 / 3837

页数：9

共 52 条

[1] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.170
[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Behley, Jens
Garbade, Martin
Milioto, Andres
Quenzel, Jan
Behnke, Sven
Stachniss, Cyrill
Gall, Juergen
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
[4] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[5] Caesar Holger, 2020, CVPR, P11621
[6] Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430
[7] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
Chen, Shuaijun
Jia, Xu
He, Jianzhong
Shi, Yongjie
Liu, Jianzhuang
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11013 - 11022
[8] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Choi, Yunjey
Choi, Minje
Kim, Munyoung
Ha, Jung-Woo
Kim, Sunghun
Choo, Jaegul
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
[9] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
Dai, Angela
Qi, Charles Ruizhongtai
Niessner, Matthias
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
[10] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Dai, Angela
Chang, Angel X.
Savva, Manolis
Halber, Maciej
Funkhouser, Thomas
Niessner, Matthias
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2432 - 2443

← 1 2 3 4 5 6 →