Multi-modal unsupervised domain adaptation for semantic image segmentation

被引：16

作者：

Hu, Sijie ^{[1
]}

Bonardi, Fabien ^{[1
]}

Bouchafa, Samia ^{[1
]}

Sidibe, Desire ^{[1
]}

机构：

[1] Univ Paris Saclay, Univ Evry, IBISC, F-91020 Evry Courcouronnes, France

来源：

PATTERN RECOGNITION | 2023年 / 137卷

关键词：

Unsupervised domain adaptation; Multi -modal learning; Self -supervised learning; Knowledge transfer; Semantic segmentation;

D O I：

10.1016/j.patcog.2022.109299

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a novel multi-modal-based Unsupervised Domain Adaptation (UDA) method for semantic segmentation. Recently, depth has proven to be a relevent property for providing geometric cues to en-hance the RGB representation. However, existing UDA methods solely process RGB images or additionally cultivate depth-awareness with an auxiliary depth estimation task. We argue that geometric cues that are crucial to semantic segmentation, such as local shape and relative position, are challenging to recover from an auxiliary depth estimation task with mere color (RGB) information. In this paper, we propose a novel multi-modal UDA method named MMADT, which relies on both RGB and depth images as input. In particular, we design a Depth Fusion Block (DFB) to recalibrate depth information and leverage Depth Ad-versarial Training (DAT) to bridge the depth discrepancy between the source and target domain. Besides, we propose a self-supervised multi-modal depth estimation assistant network named Geo-Assistant (GA) to align the feature space of RGB and depth and shape the sensitivity of our MMADT to depth infor-mation. We experimentally observed significant performance improvement in multiple synthetic to real adaptation benchmarks, i.e., SYNTHIA-to-Cityscapes, GTA5-to-Cityscapes and SELMA-to-Cityscapes. Addi-tionally, our multi-modal UDA scheme is easy to port to other UDA methods with a consistent perfor-mance boost. (c) 2023 Elsevier Ltd. All rights reserved.

引用

页数：12

共 42 条

[1] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[2]

Chen T, 2020, PR MACH LEARN RES, V119

[3] Learning Semantic Segmentation from Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach [J].

Chen, Yuhua ;

Li, Wen ;

Chen, Xiaoran ;

Van Gool, Luc .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1841-1850

[4] Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].

Cheng, Yanhua ;

Cai, Rui ;

Li, Zhiwei ;

Zhao, Xin ;

Huang, Kaiqi .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483

[5] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[6] Digging Into Self-Supervised Monocular Depth Estimation [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Firman, Michael ;

Brostow, Gabriel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837

[7] Scale variance minimization for unsupervised domain adaptation in image segmentation [J].

Guan, Dayan ;

Huang, Jiaxing ;

Lu, Shijian ;

Xiao, Aoran .

PATTERN RECOGNITION, 2021, 112

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Heo B, 2019, AAAI CONF ARTIF INTE, P3779

[10]

Hinton O., 2015, P INT WORKSH NEUR IN

← 1 2 3 4 5 →