Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation

被引：1

作者：

Wu, Yao ^{[1
]}

Xing, Mingwei ^{[2
]}

Zhang, Yachao ^{[3
]}

Xie, Yuan ^{[4
,5
]}

Fan, Jianping ^{[6
]}

Shi, Zhongchao ^{[6
]}

Qu, Yanyun ^{[2
]}

机构：

[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China

[2] Xiamen Univ, Inst Artificial Intelligence, Xiamen, Peoples R China

[3] Tsinghua Univ, Shenzhen, Peoples R China

[4] East China Normal Univ, Shanghai, Peoples R China

[5] East China Normal Univ, Chongqing Inst, Chongqing, Peoples R China

[6] Lenovo Res, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

3D semantic segmentation; Unsupervised domain adaptation;

D O I：

10.1145/3581783.3612013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal Unsupervised Domain Adaptation (UDA) becomes a research hotspot because it reduces the laborious annotation of target domain samples. Existing methods only mutually mimic the outputs of cross-modality in each domain, which enforces the class probability distribution agreeable in different domains. However, these methods ignore the complementarity brought by the modality fusion representation in cross-modal learning. In this paper, we propose a cross-modal UDA method for 3D semantic segmentation via Bidirectional Fusion-then-Distillation, named BFtD-xMUDA, which explores cross-modal fusion in UDA and realizes distribution consistency between outputs of two domains not only for 2D image and 3D point cloud but also for 2D/3D and fusion. Our method contains three significant components: Model-agnostic Feature Fusion Module (MFFM), Bidirectional Distillation (B-Distill), and Cross-modal Debiased Pseudo-Labeling (xDPL). MFFM is employed to generate cross-modal fusion features for establishing a latent space, which enforces maximum correlation and complementarity between two heterogeneous modalities. B-Distill is introduced to exploit bidirectional knowledge distillation which includes cross-modality and cross-domain fusion distillation, and well-achieving domain-modality alignment. xDPL is designed to model the uncertainty of pseudo-labels by self-training scheme. Extensive experimental results demonstrate that our method outperforms state-of-the-art competitors in several adaptation scenarios.

引用

页码：490 / 498

页数：9

共 44 条

[41] Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction
Liang, Hanxue
Fan, Hehe
Fan, Zhiwen
Wang, Yi
Chen, Tianlong
Cheng, Yu
Wang, Zhangyang
[J]. COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 156 - 172
[42] GENERATION FOR UNSUPERVISED DOMAIN ADAPTATION: A GAN-BASED APPROACH FOR OBJECT CLASSIFICATION WITH 3D POINT CLOUD DATA
Huang, Junxuan
Yuan, Junsong
Qiao, Chunming
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3753 - 3757
[43] End-to-End Visual Domain Adaptation Network for Cross-Domain 3D CPS Data Retrieval
Liu, An-An
Xiang, Shu
Nie, Wei-Zhi
Song, Dan
[J]. IEEE ACCESS, 2019, 7 : 118630 - 118638
[44] S2Mix: Style and Semantic Mix for cross-domain 3D model retrieval
Fu, Xinwei
Song, Dan
Yang, Yue
Zhang, Yuyi
Wang, Bo
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 107

← 1 2 3 4 5 →