CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引：1

作者：

Liu, Peng ^{[1
]}

Ge, Yanqi ^{[2
]}

Duan, Lixin ^{[1
,3
]}

Li, Wen ^{[2
]}

Lv, Fengmao ^{[4
,5
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China

[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China

[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China

[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2024年 / 20卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;

D O I：

10.1109/TII.2024.3412006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.

引用

页码：11666 / 11675

页数：10

共 50 条

[1] Cross-Domain Scene Unsupervised Learning Segmentation With Dynamic Subdomains
He, Pei
Jiao, Licheng
Liu, Fang
Liu, Xu
Shang, Ronghua
Wang, Shuang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6770 - 6784
[2] Cross-Modal Cross-Domain Dual Alignment Network for RGB-Infrared Person Re-Identification
Fu, Xiaowei
Huang, Fuxiang
Zhou, Yuhang
Ma, Huimin
Xu, Xin
Zhang, Lei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6874 - 6887
[3] Cross-Domain Few-Shot Hyperspectral Image Classification With Cross-Modal Alignment and Supervised Contrastive Learning
Li, Zhaokui
Zhang, Chenyang
Wang, Yan
Li, Wei
Du, Qian
Fang, Zhuoqun
Chen, Yushi
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 19
[4] Cross-Domain Transfer Hashing for Efficient Cross-Modal Retrieval
Li, Fengling
Wang, Bowen
Zhu, Lei
Li, Jingjing
Zhang, Zheng
Chang, Xiaojun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9664 - 9677
[5] Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation
Zhao, Wentian
Wu, Xinxiao
Luo, Jiebo
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1180 - 1192
[6] CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing
Zhou, Wujie
Dong, Shaohua
Fang, Meixin
Yu, Lu
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1919 - 1929
[7] Semisupervised Cross-Domain Remote Sensing Scene Classification via Category-Level Feature Alignment Network
Li, Yang
Li, Zhang
Su, Ang
Wang, Kun
Wang, Zi
Yu, Qifeng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[8] Cross-Modal Learning for Event-Based Semantic Segmentation via Attention Soft Alignment
Xie, Chuyun
Gao, Wei
Guo, Ren
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03): : 2359 - 2366
[9] PMDA: Domain Alignment with Prototype Matching for Cross-Domain Adaptive Segmentation
Li, Weiwei
Ren, Yuanyuan
Liu, Junzhuo
Wang, Chenyang
Zheng, Yuchen
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2339 - 2344
[10] Joint alignment of the distribution in input and feature space for cross-domain aerial image semantic segmentation
Chen, Zhe
Yang, Bisheng
Ma, Ailong
Peng, Mingjun
Li, Haiting
Chen, Tao
Chen, Chi
Dong, Zhen
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 115

← 1 2 3 4 5 →