CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引:1
|
作者
Liu, Peng [1 ]
Ge, Yanqi [2 ]
Duan, Lixin [1 ,3 ]
Li, Wen [2 ]
Lv, Fengmao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China
[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;
D O I
10.1109/TII.2024.3412006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.
引用
收藏
页码:11666 / 11675
页数:10
相关论文
共 50 条
  • [31] TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation
    Gong, Rui
    Danelljan, Martin
    Dai, Dengxin
    Paudel, Danda Pani
    Chhatkuli, Ajad
    Yu, Fisher
    Van Gool, Luc
    COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 19 - 35
  • [32] Text Enhancement Network for Cross-Domain Scene Text Detection
    Deng, Jinhong
    Luo, Xiulian
    Zheng, Jiawen
    Dang, Wanli
    Li, Wen
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2203 - 2207
  • [33] CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator
    Kuang, Xinkai
    Niu, Runxin
    Hua, Chen
    Jiang, Chunmao
    Zhu, Hui
    Chen, Ziyu
    Yu, Biao
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11497 - 11504
  • [34] A Relation Feature Comparison Network for Cross-Domain Recognition of Motion Intention
    Xu, Jiacan
    Li, Donglin
    Zhou, Peng
    Zhang, Yuxian
    Wang, Zinan
    Ma, Dazhong
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [35] Transferring Multi-Modal Domain Knowledge to Uni-Modal Domain for Urban Scene Segmentation
    Liu, Peng
    Ge, Yanqi
    Duan, Lixin
    Li, Wen
    Luo, Haonan
    Lv, Fengmao
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) : 11576 - 11589
  • [36] Cross-Domain Transformer with Adaptive Thresholding for Domain Adaptive Semantic Segmentation
    Liu, Quansheng
    Wang, Lei
    Jun, Yu
    Gao, Fang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 147 - 159
  • [37] X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation
    Shubhankar Borse
    Marvin Klingner
    Varun Ravi
    Hong Cai
    Abdulaziz Almuzairee
    Senthil Yogamani
    Fatih Porikli
    Machine Vision and Applications, 2023, 34
  • [38] Neural Attentive Network for Cross-Domain Aspect-Level Sentiment Classification
    Yang, Min
    Yin, Wenpeng
    Qu, Qiang
    Tu, Wenting
    Shen, Ying
    Chen, Xiaojun
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (03) : 761 - 775
  • [39] SAN: Selective Alignment Network for Cross-Domain Pedestrian Detection
    Jiao, Yifan
    Yao, Hantao
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2155 - 2167
  • [40] Category-Level Adversaries for Outdoor LiDAR Point Clouds Cross-Domain Semantic Segmentation
    Yuan, Zhimin
    Wen, Chenglu
    Cheng, Ming
    Su, Yanfei
    Liu, Weiquan
    Yu, Shangshu
    Wang, Cheng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (02) : 1982 - 1993