A Feature Divide-and-Conquer Network for RGB-T Semantic Segmentation

被引：23

作者：

Zhao, Shenlu ^{[1
,2
]}

Zhang, Qiang ^{[1
,2
]}

机构：

[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China

[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Semantic segmentation; Data mining; Semantics; Lighting; Decoding; Thermal sensors; RGB-T semantic segmentation; feature divide-and-conquer strategy; multi-scale contextual information;

D O I：

10.1109/TCSVT.2022.3229359

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Similar to other multi-modal pixel-level prediction tasks, existing RGB-T semantic segmentation methods usually employ a two-stream structure to extract RGB and thermal infrared (TIR) features, respectively, and adopt the same fusion strategies to integrate different levels of unimodal features. This will result in inadequate extraction of unimodal features and exploitation of cross-modal information from the paired RGB and TIR images. Alternatively, in this paper, we present a novel RGB-T semantic segmentation model, i.e., FDCNet, where a feature divide-and-conquer strategy performs unimodal feature extraction and cross-modal feature fusion in one go. Concretely, we first employ a two-stream structure to extract unimodal low-level features, followed by a Siamese structure to extract unimodal high-level features from the paired RGB and TIR images. This concise but efficient structure enables to take into account both the modality discrepancies of low-level features and the underlying semantic consistency of high-level features across the paired RGB and TIR images. Furthermore, considering the characteristics of different layers of features, a Cross-modal Spatial Activation (CSA) module and a Cross-modal Channel Activation (CCA) module are presented for the fusion of low-level RGB and TIR features and for the fusion of high-level RGB and TIR features, respectively, thus facilitating the capture of cross-modal information. On top of that, with an embedded Cross-scale Interaction Context (CIC) module for mining multi-scale contextual information, our proposed model (i.e., FDCNet) for RGB-T semantic segmentation achieves new state-of-the-art experimental results on MFNet dataset and PST900 dataset.

引用

页码：2892 / 2905

页数：14

共 50 条

[21] MFCNet: Multimodal Feature Fusion Network for RGB-T Vehicle Density Estimation
Qin, Ling-Xiao
Sun, Hong-Mei
Duan, Xiao-Meng
Che, Cheng-Yue
Jia, Rui-Sheng
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (04): : 4207 - 4219
[22] Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation
Liu, Hengyan
Zhang, Wenzhang
Dai, Tianhong
Yin, Longfei
Ren, Guangyu
2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
[23] CIGF-Net: Cross-Modality Interaction and Global-Feature Fusion for RGB-T Semantic Segmentation
Zhang, Zhiwei
Liu, Yisha
Xue, Weimin
Zhuang, Yan
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
[24] Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets
Chen, Yu
Li, Xiang
Luan, Chao
Hou, Weimin
Liu, Haochen
Zhu, Zihui
Xue, Lian
Zhang, Jianqi
Liu, Delian
Wu, Xin
Wei, Linfang
Jian, Chaochao
Li, Jinze
PATTERN RECOGNITION, 2025, 161
[25] Few-Shot Segmentation via Divide-and-Conquer Proxies
Lang, Chunbo
Cheng, Gong
Tu, Binfei
Han, Junwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (01) : 261 - 283
[26] Few-Shot Segmentation via Divide-and-Conquer Proxies
Chunbo Lang
Gong Cheng
Binfei Tu
Junwei Han
International Journal of Computer Vision, 2024, 132 : 261 - 283
[27] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
Lyu, Y.
Schiopu, I.
Munteanu, A.
ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
[28] Self-Enhanced Feature Fusion for RGB-D Semantic Segmentation
Xiang, Pengcheng
Yao, Baochen
Jiang, Zefeng
Peng, Chengbin
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 3015 - 3019
[29] Multimodal Feature-Guided Pretraining for RGB-T Perception
Ouyang, Junlin
Jin, Pengcheng
Wang, Qingwang
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 16041 - 16050
[30] SFAF-MA: Spatial Feature Aggregation and Fusion With Modality Adaptation for RGB-Thermal Semantic Segmentation
He, Xunjie
Wang, Meiling
Liu, Tong
Zhao, Lin
Yue, Yufeng
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72

← 1 2 3 4 5 →