Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation

被引:4
作者
Liu, Hengyan [1 ]
Zhang, Wenzhang [2 ]
Dai, Tianhong [3 ]
Yin, Longfei [4 ]
Ren, Guangyu [4 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch AI & Adv Comp, Suzhou, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Internet Things, Suzhou, Peoples R China
[3] Univ Aberdeen, Dept Comp Sci, Aberdeen, Scotland
[4] Imperial Coll London, Dept Elect & Elect Engn, London, England
来源
2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024 | 2024年
关键词
Semantic Segmentation; Multimodal Fusion; Frequency Spectrum; Contrastive Learning; Determinantal point processes; NETWORK;
D O I
10.1109/ICCCN61486.2024.10637614
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic segmentation confronts challenges with traditional networks tailored exclusively for RGB inputs, which may suffer from quality degradation under adverse conditions like low-level illumination or inclement weather. Recent advancements have shown promising outcomes by integrating RGB images with corresponding thermal infrared (TIR) images. However, effectively fusing features from both modalities remains a significant challenge. In this paper, we introduce a novel approach termed Multimodal Frequency Spectrum Fusion Schema (MFSFS) for semantic segmentation of RGB-T images. MFSFS leverages the advantages of the frequency spectrum to effectively extract and utilize multimodal feature information. To mitigate redundant information's adverse effects during multimodal fusion in the frequency domain, we propose a diversity-oriented contrastive learning approach. Simulation results demonstrate that MFSFS achieves competitive performance while maintaining a relatively smaller model size.
引用
收藏
页数:6
相关论文
共 28 条
[1]  
Cai H., 2022, ARXIV
[2]  
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[3]  
Dai T., 2021, PROC 18 PACIFIC RIM, P32, DOI DOI 10.1007/978-3-030-89370-53
[4]   Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation [J].
Ghiasi, Golnaz ;
Fowlkes, Charless C. .
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :519-534
[5]  
Grill Jean-Bastien, 2020, ADV NEUR IN, V33
[6]  
Ha Q, 2017, IEEE INT C INT ROBOT, P5108, DOI 10.1109/IROS.2017.8206396
[7]   FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture [J].
Hazirbas, Caner ;
Ma, Lingni ;
Domokos, Csaba ;
Cremers, Daniel .
COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :213-228
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[10]   MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation [J].
Lan, Xin ;
Gu, Xiaojing ;
Gu, Xingsheng .
APPLIED INTELLIGENCE, 2022, 52 (05) :5817-5829