Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation

被引:12
作者
Hoyer, Lukas [1 ]
Dai, Dengxin [2 ]
Van Gool, Luc [1 ,3 ,4 ]
机构
[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland
[2] Huawei Zurich Res Ctr, CH-8050 Zurich, Switzerland
[3] Katholieke Univ Leuven, B-3000 Leuven, Belgium
[4] INSAIT, Sofia 1784, Bulgaria
关键词
Domain adaptation; domain generalization; semantic segmentation; transformers; high-resolution; multi-resolution;
D O I
10.1109/TPAMI.2023.3320613
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks.
引用
收藏
页码:220 / 235
页数:16
相关论文
共 84 条
[1]  
[Anonymous], 2019, P 3 INT C LEARN REPR, P1
[2]   Self-supervised Augmentation Consistency for Adapting Semantic Segmentation [J].
Araslanov, Nikita ;
Roth, Stefan .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15379-15389
[3]  
Bashkirova D, 2023, Arxiv, DOI [arXiv:2303.14828, DOI arXiv:2303.14828.v1]
[4]  
Bethge M., 2018, P INT C LEARN REPR, P1
[5]   Understanding Robustness of Transformers for Image Classification [J].
Bhojanapalli, Srinadh ;
Chakrabarti, Ayan ;
Glasner, Daniel ;
Li, Daliang ;
Unterthiner, Thomas ;
Veit, Andreas .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :10211-10221
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]   Attention to Scale: Scale-aware Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Yang, Yi ;
Wang, Jiang ;
Xu, Wei ;
Yuille, Alan L. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3640-3649
[9]   ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes [J].
Chen, Yuhua ;
Li, Wen ;
Van Gool, Luc .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7892-7901
[10]   RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening [J].
Choi, Sungha ;
Jung, Sanghun ;
Yun, Huiwon ;
Kim, Joanne T. ;
Kim, Seungryong ;
Choo, Jaegul .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11575-11585