STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection

被引:0
作者
Ma, Qianwen [1 ]
Li, Xiaobo [1 ]
Li, Bincheng [1 ]
Zhu, Zhen [1 ]
Wu, Jing [2 ]
Huang, Feng [2 ]
Hu, Haofeng [1 ]
机构
[1] Tianjin Univ, Sch Marine Sci & Technol, Tianjin 300072, Peoples R China
[2] Fuzhou Univ, Coll Mech Engn & Automat, Fuzhou 350108, Peoples R China
基金
中国国家自然科学基金;
关键词
Underwater salient object detection; Polarimetric imaging; Transformer; Mamba; Dataset; NEURAL-NETWORK; MODEL;
D O I
10.1016/j.inffus.2025.103182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: https://github.com/Kingwin97/STAMF.
引用
收藏
页数:15
相关论文
共 96 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] A Revised Underwater Image Formation Model
    Akkaynak, Derya
    Treibitz, Tali
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6723 - 6732
  • [3] Chen Y., 2024, IEEE Trans. Geosci. Remote Sens.
  • [4] A Highly Efficient Model to Study the Semantics of Salient Object Detection
    Cheng, Ming-Ming
    Gao, Shang-Hua
    Borji, Ali
    Tan, Yong-Qiang
    Lin, Zheng
    Wang, Meng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 8006 - 8021
  • [5] Attentional Feature Fusion
    Dai, Yimian
    Gieseke, Fabian
    Oehmcke, Stefan
    Wu, Yiquan
    Barnard, Kobus
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3559 - 3568
  • [6] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
  • [7] Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images
    Dong, Pengwei
    Wang, Bo
    Cong, Runmin
    Sun, Hai-Han
    Li, Chongyi
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] Fan DP, 2018, Arxiv, DOI arXiv:1805.10421
  • [10] Structure-measure: A New Way to Evaluate Foreground Maps
    Fan, Deng-Ping
    Cheng, Ming-Ming
    Liu, Yun
    Li, Tao
    Borji, Ali
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4558 - 4567