CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引：1

作者：

Liu, Peng ^{[1
]}

Ge, Yanqi ^{[2
]}

Duan, Lixin ^{[1
,3
]}

Li, Wen ^{[2
]}

Lv, Fengmao ^{[4
,5
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China

[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China

[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China

[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2024年 / 20卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;

D O I：

10.1109/TII.2024.3412006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.

引用

页码：11666 / 11675

页数：10

共 50 条

[31] TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation
Gong, Rui
Danelljan, Martin
Dai, Dengxin
Paudel, Danda Pani
Chhatkuli, Ajad
Yu, Fisher
Van Gool, Luc
COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 19 - 35
[32] Text Enhancement Network for Cross-Domain Scene Text Detection
Deng, Jinhong
Luo, Xiulian
Zheng, Jiawen
Dang, Wanli
Li, Wen
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2203 - 2207
[33] CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator
Kuang, Xinkai
Niu, Runxin
Hua, Chen
Jiang, Chunmao
Zhu, Hui
Chen, Ziyu
Yu, Biao
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11497 - 11504
[34] A Relation Feature Comparison Network for Cross-Domain Recognition of Motion Intention
Xu, Jiacan
Li, Donglin
Zhou, Peng
Zhang, Yuxian
Wang, Zinan
Ma, Dazhong
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[35] Transferring Multi-Modal Domain Knowledge to Uni-Modal Domain for Urban Scene Segmentation
Liu, Peng
Ge, Yanqi
Duan, Lixin
Li, Wen
Luo, Haonan
Lv, Fengmao
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) : 11576 - 11589
[36] Cross-Domain Transformer with Adaptive Thresholding for Domain Adaptive Semantic Segmentation
Liu, Quansheng
Wang, Lei
Jun, Yu
Gao, Fang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 147 - 159
[37] X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation
Shubhankar Borse
Marvin Klingner
Varun Ravi
Hong Cai
Abdulaziz Almuzairee
Senthil Yogamani
Fatih Porikli
Machine Vision and Applications, 2023, 34
[38] Neural Attentive Network for Cross-Domain Aspect-Level Sentiment Classification
Yang, Min
Yin, Wenpeng
Qu, Qiang
Tu, Wenting
Shen, Ying
Chen, Xiaojun
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (03) : 761 - 775
[39] SAN: Selective Alignment Network for Cross-Domain Pedestrian Detection
Jiao, Yifan
Yao, Hantao
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2155 - 2167
[40] Category-Level Adversaries for Outdoor LiDAR Point Clouds Cross-Domain Semantic Segmentation
Yuan, Zhimin
Wen, Chenglu
Cheng, Ming
Su, Yanfei
Liu, Weiquan
Yu, Shangshu
Wang, Cheng
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (02) : 1982 - 1993

← 1 2 3 4 5 →