STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection

被引：0

作者：

Ma, Qianwen ^{[1
]}

Li, Xiaobo ^{[1
]}

Li, Bincheng ^{[1
]}

Zhu, Zhen ^{[1
]}

Wu, Jing ^{[2
]}

Huang, Feng ^{[2
]}

Hu, Haofeng ^{[1
]}

机构：

[1] Tianjin Univ, Sch Marine Sci & Technol, Tianjin 300072, Peoples R China

[2] Fuzhou Univ, Coll Mech Engn & Automat, Fuzhou 350108, Peoples R China

来源：

INFORMATION FUSION | 2025年 / 122卷

基金：

中国国家自然科学基金;

关键词：

Underwater salient object detection; Polarimetric imaging; Transformer; Mamba; Dataset; NEURAL-NETWORK; MODEL;

D O I：

10.1016/j.inffus.2025.103182

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: https://github.com/Kingwin97/STAMF.

引用

页数：15

共 96 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] A Revised Underwater Image Formation Model
Akkaynak, Derya
Treibitz, Tali
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6723 - 6732
[3] Chen Y., 2024, IEEE Trans. Geosci. Remote Sens.
[4] A Highly Efficient Model to Study the Semantics of Salient Object Detection
Cheng, Ming-Ming
Gao, Shang-Hua
Borji, Ali
Tan, Yong-Qiang
Lin, Zheng
Wang, Meng
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 8006 - 8021
[5] Attentional Feature Fusion
Dai, Yimian
Gieseke, Fabian
Oehmcke, Stefan
Wu, Yiquan
Barnard, Kobus
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3559 - 3568
[6] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
[7] Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images
Dong, Pengwei
Wang, Bo
Cong, Runmin
Sun, Hai-Han
Li, Chongyi
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
[8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9] Fan DP, 2018, Arxiv, DOI arXiv:1805.10421
[10] Structure-measure: A New Way to Evaluate Foreground Maps
Fan, Deng-Ping
Cheng, Ming-Ming
Liu, Yun
Li, Tao
Borji, Ali
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4558 - 4567

← 1 2 3 4 5 6 7 8 9 10 →