CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引:1
|
作者
Liu, Peng [1 ]
Ge, Yanqi [2 ]
Duan, Lixin [1 ,3 ]
Li, Wen [2 ]
Lv, Fengmao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China
[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;
D O I
10.1109/TII.2024.3412006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.
引用
收藏
页码:11666 / 11675
页数:10
相关论文
共 50 条
  • [41] Cross-Domain Detection Transformer Based on Spatial-Aware and Semantic-Aware Token Alignment
    Deng, Jinhong
    Zhang, Xiaoyue
    Li, Wen
    Duan, Lixin
    Xu, Dong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5234 - 5245
  • [42] Cross-Modal Consistency for Single-Modal MR Image Segmentation
    Xu, Wenxuan
    Li, Cangxin
    Bian, Yun
    Meng, Qingquan
    Zhu, Weifang
    Shi, Fei
    Chen, Xinjian
    Shao, Chengwei
    Xiang, Dehui
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2024, 71 (09) : 2557 - 2567
  • [43] Multispectral Scene Classification via Cross-Modal Knowledge Distillation
    Liu, Hao
    Qu, Ying
    Zhang, Liqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [44] Self-Ensembling GAN for Cross-Domain Semantic Segmentation
    Xu, Yonghao
    He, Fengxiang
    Du, Bo
    Tao, Dacheng
    Zhang, Liangpei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7837 - 7850
  • [45] Integrating Cross-Domain Feature Representation and Semantic Guidance for Underwater Image Enhancement
    Li, Fei
    Zheng, Jiangbin
    Wang, Lu
    Wang, Shengkang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1511 - 1515
  • [46] Prototypical Bidirectional Adaptation and Learning for Cross-Domain Semantic Segmentation
    Ren, Qinghua
    Mao, Qirong
    Lu, Shijian
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 501 - 513
  • [47] CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation
    Xu, Mingzhu
    Xiao, Tianxiang
    Liu, Yutong
    Tang, Haoyu
    Hu, Yupeng
    Nie, Liqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3234 - 3249
  • [48] Weakly-Supervised Cross-Domain Road Scene Segmentation via Multi-Level Curriculum Adaptation
    Lv, Fengmao
    Lin, Guosheng
    Liu, Peng
    Yang, Guowu
    Pan, Sinno Jialin
    Duan, Lixin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (09) : 3493 - 3503
  • [49] Remote Sensing Teacher: Cross-Domain Detection Transformer With Learnable Frequency-Enhanced Feature Alignment in Remote Sensing Imagery
    Han, Jianhong
    Yang, Wenjie
    Wang, Yupei
    Chen, Liang
    Luo, Zhaoyi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [50] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
    Dong, Jianfeng
    Long, Zhongzi
    Mao, Xiaofeng
    Lin, Changting
    He, Yuan
    Ji, Shouling
    NEUROCOMPUTING, 2021, 440 : 207 - 219