Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets

被引:0
作者
Chen, Yu [1 ]
Li, Xiang [1 ]
Luan, Chao [2 ]
Hou, Weimin [2 ]
Liu, Haochen [2 ]
Zhu, Zihui [3 ]
Xue, Lian [3 ]
Zhang, Jianqi [1 ]
Liu, Delian [1 ]
Wu, Xin [1 ]
Wei, Linfang [1 ]
Jian, Chaochao [1 ]
Li, Jinze [1 ]
机构
[1] Xidian Univ, Sch Optoelect Engn, Xian 710071, Peoples R China
[2] Beijing Inst Control & Elect Technol, Beijing 100038, Peoples R China
[3] Natl Key Lab Sci & Technol Test Phys & Numer Math, Beijing 100076, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantic segmentation; Feature fusion; Cross modality; Multi-scale information; Distant object;
D O I
10.1016/j.patcog.2024.111218
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-T segmentation represents an innovative approach driven by advancements in multispectral detection and is poised to replace traditional RGB segmentation methods. An effective cross-modality feature fusion module is essential for this technology. The precise segmentation of distant objects is another significant challenge. Focused on these two areas, we propose an end-to-end distant object feature fusion network (DOFFNet) for RGB-T segmentation. Initially, we introduce a cross-level interaction fusion strategy (CLIF) and an inter-correlation fusion method (IFFM) in the encoder to enhance multi-scale feature expression and improve fusion accuracy. Subsequently, we propose a residual dense pixel convolution (R-DPC) in the decoder with a trainable upsampling unit that dynamically reconstructs information lost during encoding, particularly for distant objects whose features may vanish after pooling. Experimental results show that our DOFFNet achieves a top mean pixel accuracy of 75.8% and dramatically improves accuracy for four classes, including objects occupying as little as 0.2%-2% of total pixels. This improvement ensures more reliable and effective performance in practical applications, particularly in scenarios where small object detection is critical. Moreover, it demonstrates potential applicability in other fields like medical imaging and remote sensing.
引用
收藏
页数:13
相关论文
共 39 条
  • [11] Employing machine learning for enhanced abdominal fat prediction in cavitation post-treatment
    Hady, Doaa A. Abdel
    Mabrouk, Omar M.
    Abd El-Hafeez, Tarek
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [12] Complementary branch fusing class and semantic knowledge for robust weakly supervised semantic segmentation
    Han, Woojung
    Kang, Seil
    Choo, Kyobin
    Hwang, Seong Jae
    [J]. PATTERN RECOGNITION, 2024, 157
  • [13] Identity Mappings in Deep Residual Networks
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 630 - 645
  • [14] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [15] CMGFNet: A deep cross-modal gated fusion network for building extraction from very high-resolution remote sensing images
    Hosseinpour, Hamidreza
    Samadzadegan, Farhad
    Javan, Farzaneh Dadrass
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 96 - 115
  • [16] Segment Anything
    Kirillov, Alexander
    Mintun, Eric
    Ravi, Nikhila
    Mao, Hanzi
    Rolland, Chloe
    Gustafson, Laura
    Xiao, Tete
    Whitehead, Spencer
    Berg, Alexander C.
    Lo, Wan-Yen
    Dolla'r, Piotr
    Girshick, Ross
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3992 - 4003
  • [17] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Lan, Xin
    Gu, Xiaojing
    Gu, Xingsheng
    [J]. APPLIED INTELLIGENCE, 2022, 52 (05) : 5817 - 5829
  • [18] Backpropagation Applied to Handwritten Zip Code Recognition
    LeCun, Y.
    Boser, B.
    Denker, J. S.
    Henderson, D.
    Howard, R. E.
    Hubbard, W.
    Jackel, L. D.
    [J]. NEURAL COMPUTATION, 1989, 1 (04) : 541 - 551
  • [19] Deep learning
    LeCun, Yann
    Bengio, Yoshua
    Hinton, Geoffrey
    [J]. NATURE, 2015, 521 (7553) : 436 - 444
  • [20] RGB-T Semantic Segmentation With Location, Activation, and Sharpening
    Li, Gongyang
    Wang, Yike
    Liu, Zhi
    Zhang, Xinpeng
    Zeng, Dan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1223 - 1235