Scene adaptation network for visual-thermal urban scene semantic segmentation

被引:0
作者
Zhang, Houwang [1 ]
Li, Yong-Jie [2 ]
Chan, Leanne Lai-Hang [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Life Sci & Technol, Chengdu 610054, Peoples R China
关键词
Visual-thermal images segmentation; Feature fusion; Urban scenes understanding; Scene adaptation; Self-supervised learning; RGB;
D O I
10.1016/j.engappai.2025.111166
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As one aspect of an intelligent transportation system, urban scene segmentation is important for understanding and analyzing a scene as part of the operation of an autonomous driving system. Over recent decades, research into semantic segmentation has made great breakthroughs due to the development of deep learning. However, in complex urban scenes, the performance of deep learning models is usually seriously affected, and the issue of how to effectively combine information from different spectra, such as visual images with red, green, and blue (RGB) colors and thermal infrared (TIR) images for urban scene segmentation has become a hot topic. To achieve effective feature extraction and fusion from different spectra while simultaneously considering the characteristics of urban scenes, a scene adaptation network (SAN) is proposed here to dynamically adjust and incorporate the features from RGB and TIR images based on the scene conditions. We employ multilevel operations and a self-attention module to fuse and enhance the semantic features. In addition, to promote the learning of the correlated and complementary information between different spectra, we design a self-supervised learning scheme for the training of our SAN. Through comprehensive experiments on three visual-thermal urban scene semantic segmentation datasets, we demonstrate the effectiveness and superiority of the proposed SAN against other state-of-the-art approaches.
引用
收藏
页数:14
相关论文
共 62 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]   The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].
Berman, Maxim ;
Triki, Amal Rannen ;
Blaschko, Matthew B. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421
[3]   DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation [J].
Cai, Yuqi ;
Zhou, Wujie ;
Zhang, Liting ;
Yu, Lu ;
Luo, Ting .
VISUAL COMPUTER, 2024, 40 (01) :169-179
[4]  
Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.1109/tpami.2017.2699184]
[5]  
Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]   Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [J].
Chen, Xiaokang ;
Lin, Kwan-Yee ;
Wang, Jingbo ;
Wu, Wayne ;
Qian, Chen ;
Li, Hongsheng ;
Zeng, Gang .
COMPUTER VISION - ECCV 2020, PT XI, 2020, 12356 :561-577
[9]   Dual-branch deep cross-modal interaction network for semantic segmentation with thermal images [J].
Dai, Kang ;
Chen, Suting .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
[10]  
F. A. Group, 2019, FLIR Thermal Dataset for Algorithm Training