Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation

被引:1
作者
Wu, Yao [1 ]
Xing, Mingwei [2 ]
Zhang, Yachao [3 ]
Xie, Yuan [4 ,5 ]
Fan, Jianping [6 ]
Shi, Zhongchao [6 ]
Qu, Yanyun [2 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Xiamen Univ, Inst Artificial Intelligence, Xiamen, Peoples R China
[3] Tsinghua Univ, Shenzhen, Peoples R China
[4] East China Normal Univ, Shanghai, Peoples R China
[5] East China Normal Univ, Chongqing Inst, Chongqing, Peoples R China
[6] Lenovo Res, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
3D semantic segmentation; Unsupervised domain adaptation;
D O I
10.1145/3581783.3612013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal Unsupervised Domain Adaptation (UDA) becomes a research hotspot because it reduces the laborious annotation of target domain samples. Existing methods only mutually mimic the outputs of cross-modality in each domain, which enforces the class probability distribution agreeable in different domains. However, these methods ignore the complementarity brought by the modality fusion representation in cross-modal learning. In this paper, we propose a cross-modal UDA method for 3D semantic segmentation via Bidirectional Fusion-then-Distillation, named BFtD-xMUDA, which explores cross-modal fusion in UDA and realizes distribution consistency between outputs of two domains not only for 2D image and 3D point cloud but also for 2D/3D and fusion. Our method contains three significant components: Model-agnostic Feature Fusion Module (MFFM), Bidirectional Distillation (B-Distill), and Cross-modal Debiased Pseudo-Labeling (xDPL). MFFM is employed to generate cross-modal fusion features for establishing a latent space, which enforces maximum correlation and complementarity between two heterogeneous modalities. B-Distill is introduced to exploit bidirectional knowledge distillation which includes cross-modality and cross-domain fusion distillation, and well-achieving domain-modality alignment. xDPL is designed to model the uncertainty of pseudo-labels by self-training scheme. Extensive experimental results demonstrate that our method outperforms state-of-the-art competitors in several adaptation scenarios.
引用
收藏
页码:490 / 498
页数:9
相关论文
共 44 条
  • [41] Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction
    Liang, Hanxue
    Fan, Hehe
    Fan, Zhiwen
    Wang, Yi
    Chen, Tianlong
    Cheng, Yu
    Wang, Zhangyang
    [J]. COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 156 - 172
  • [42] GENERATION FOR UNSUPERVISED DOMAIN ADAPTATION: A GAN-BASED APPROACH FOR OBJECT CLASSIFICATION WITH 3D POINT CLOUD DATA
    Huang, Junxuan
    Yuan, Junsong
    Qiao, Chunming
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3753 - 3757
  • [43] End-to-End Visual Domain Adaptation Network for Cross-Domain 3D CPS Data Retrieval
    Liu, An-An
    Xiang, Shu
    Nie, Wei-Zhi
    Song, Dan
    [J]. IEEE ACCESS, 2019, 7 : 118630 - 118638
  • [44] S2Mix: Style and Semantic Mix for cross-domain 3D model retrieval
    Fu, Xinwei
    Song, Dan
    Yang, Yue
    Zhang, Yuyi
    Wang, Bo
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 107