Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation

被引:11
作者
Li, Miaoyu [1 ]
Zhang, Yachao [1 ]
Xie, Yuan [2 ]
Gao, Zuodong [1 ]
Li, Cuihua [1 ]
Zhang, Zhizhong [2 ]
Qu, Yanyun [1 ]
机构
[1] Xiamen Univ, Xiamen, Peoples R China
[2] East China Normal Univ, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Unsupervised domain adaptation; Target-aware teacher network; Cross-domain knowledge distillation; 3D Semantic segmentation; Multi-modal style transfer;
D O I
10.1145/3503161.3547990
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the emergence of multi-modal datasets where LiDAR and camera are synchronized and calibrated, cross-modal Unsupervised Domain Adaptation (UDA) has attracted increasing attention because it reduces the laborious annotation of target domain samples. To alleviate the distribution gap between source and target domains, existing methods conduct feature alignment by using adversarial learning. However, it is well-known to be highly sensitive to hyperparameters and difficult to train. In this paper, we propose a novel model (Dual-Cross) that integrates Cross-Domain Knowledge Distillation (CDKD) and Cross-Modal Knowledge Distillation (CMKD) to mitigate domain shift. Specifically, we design the multi-modal style transfer to convert source image and point cloud to target style. With these synthetic samples as input, we introduce a target-aware teacher network to learn knowledge of the target domain. Then we present dual-cross knowledge distillation when the student is learning on source domain. CDKD constrains teacher and student predictions under same modality to be consistent. It can transfer target-aware knowledge from the teacher to the student, making the student more adaptive to the target domain. CMKD generates hybrid-modal prediction from the teacher predictions and constrains it to be consistent with both 2D and 3D student predictions. It promotes the information interaction between two modalities to make them complement each other. From the evaluation results on various domain adaptation settings, Dual-Cross significantly outperforms both uni-modal and cross-modal state-of-the-art methods.
引用
收藏
页码:3829 / 3837
页数:9
相关论文
共 52 条
  • [1] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.170
  • [2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
    Behley, Jens
    Garbade, Martin
    Milioto, Andres
    Quenzel, Jan
    Behnke, Sven
    Stachniss, Cyrill
    Gall, Juergen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
  • [4] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [5] Caesar Holger, 2020, CVPR, P11621
  • [6] Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430
  • [7] Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
    Chen, Shuaijun
    Jia, Xu
    He, Jianzhong
    Shi, Yongjie
    Liu, Jianzhuang
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11013 - 11022
  • [8] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
    Choi, Yunjey
    Choi, Minje
    Kim, Munyoung
    Ha, Jung-Woo
    Kim, Sunghun
    Choo, Jaegul
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8789 - 8797
  • [9] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [10] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
    Dai, Angela
    Chang, Angel X.
    Savva, Manolis
    Halber, Maciej
    Funkhouser, Thomas
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2432 - 2443