A Feature Divide-and-Conquer Network for RGB-T Semantic Segmentation

被引：23

作者：

Zhao, Shenlu ^{[1
,2
]}

Zhang, Qiang ^{[1
,2
]}

机构：

[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China

[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Semantic segmentation; Data mining; Semantics; Lighting; Decoding; Thermal sensors; RGB-T semantic segmentation; feature divide-and-conquer strategy; multi-scale contextual information;

D O I：

10.1109/TCSVT.2022.3229359

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Similar to other multi-modal pixel-level prediction tasks, existing RGB-T semantic segmentation methods usually employ a two-stream structure to extract RGB and thermal infrared (TIR) features, respectively, and adopt the same fusion strategies to integrate different levels of unimodal features. This will result in inadequate extraction of unimodal features and exploitation of cross-modal information from the paired RGB and TIR images. Alternatively, in this paper, we present a novel RGB-T semantic segmentation model, i.e., FDCNet, where a feature divide-and-conquer strategy performs unimodal feature extraction and cross-modal feature fusion in one go. Concretely, we first employ a two-stream structure to extract unimodal low-level features, followed by a Siamese structure to extract unimodal high-level features from the paired RGB and TIR images. This concise but efficient structure enables to take into account both the modality discrepancies of low-level features and the underlying semantic consistency of high-level features across the paired RGB and TIR images. Furthermore, considering the characteristics of different layers of features, a Cross-modal Spatial Activation (CSA) module and a Cross-modal Channel Activation (CCA) module are presented for the fusion of low-level RGB and TIR features and for the fusion of high-level RGB and TIR features, respectively, thus facilitating the capture of cross-modal information. On top of that, with an embedded Cross-scale Interaction Context (CIC) module for mining multi-scale contextual information, our proposed model (i.e., FDCNet) for RGB-T semantic segmentation achieves new state-of-the-art experimental results on MFNet dataset and PST900 dataset.

引用

页码：2892 / 2905

页数：14

共 50 条

[31] Zig-Zag Network for Semantic Segmentation of RGB-D Images
Lin, Di
Huang, Hui
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) : 2642 - 2655
[32] Region-adaptive and context-complementary cross modulation for RGB-T semantic segmentation
Peng, Fengguang
Ding, Zihan
Chen, Ziming
Wang, Gang
Hui, Tianrui
Liu, Si
Shi, Hang
PATTERN RECOGNITION, 2024, 147
[33] DCSegNet: Deep Learning Framework Based on Divide-and-Conquer Method for Liver Segmentation
Li, Congsheng
Yao, Guorong
Xu, Xu
Yang, Lei
Zhang, Yi
Wu, Tongning
Sun, Junhui
IEEE ACCESS, 2020, 8 (08): : 146838 - 146846
[34] Multispectral Fusion Transformer Network for RGB-Thermal Urban Scene Semantic Segmentation
Zhou, Heng
Tian, Chunna
Zhang, Zhenxi
Huo, Qizheng
Xie, Yongqiang
Li, Zhongbo
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[35] CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing
Zhou, Wujie
Dong, Shaohua
Fang, Meixin
Yu, Lu
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1919 - 1929
[36] CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection
Wang, Jie
Song, Kechen
Bao, Yanqi
Huang, Liming
Yan, Yunhui
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2949 - 2961
[37] Multilevel Feature Interaction Network for Remote Sensing Images Semantic Segmentation
Chen, Hongkun
Luo, Huilan
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 19831 - 19852
[38] Rgb-t semantic segmentation based on cross-operational fusion attention in autonomous driving scenario
Zhang, Jiyou
Zhang, Rongfen
Yuan, Wenhao
Liu, Yuhong
EVOLVING SYSTEMS, 2024, 15 (04) : 1429 - 1440
[39] MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation
Zhou, Wujie
Zhang, Han
Yan, Weiqing
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7096 - 7108
[40] Pixel Difference Convolutional Network for RGB-D Semantic Segmentation
Yang, Jun
Bai, Lizhi
Sun, Yaoru
Tian, Chunqi
Mao, Maoyu
Wang, Guorun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1481 - 1492

← 1 2 3 4 5 →