Efficient SonarNet: Lightweight CNN-Grafted Vision Transformer Embedding Network for Forward-Looking Sonar Image Segmentation

被引:5
作者
He, Ju [1 ]
Xu, Hu [1 ]
Li, Shaohong [1 ]
Yu, Yang [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 518057, Guangdong, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
关键词
Transformers; Feature extraction; Sonar; Semantics; Convolutional neural networks; Accuracy; Computer architecture; Deep learning (DL); feature fusion; forward-looking sonar (FLS); semantic segmentation; transformer;
D O I
10.1109/TGRS.2024.3435883
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
While the intricate underwater environment leads to blurry and faint features of sonar targets, SonarNet has depicted great success due to its high model capabilities and multifeature fusion mechanism. However, their remarkable performance is accompanied by heavier backbones and larger model sizes to achieve benefits at the cost of increased complexity. The research on fast segmenters deployed on edge devices is urgently inquired. In this article, we analyze the current best-performing sonar image segmentation network named SonarNet. Based on the analysis, we propose a lightweight local feature grafted vision transformer (ViT) embedding network for forward-looking sonar (FLS) images called EsonarNet, which promotes a priority balance between efficiency and accuracy. EsonarNet is based on a hybrid local-global feature grafting architecture and comprises four modules. First, expand from the traditional convolutional neural network (CNN) and histogram of oriented gradients (HOG), a lightweight sonar semantic segmentation model based on hybrid CNN-transformer-HOG fusion encoding and decoding, while preserving high efficiency applied to hardware resources. Second, lightweight encoder units are employed in our EsonarNet, including a designed spatial mobile inverted bottleneck convolution (SMBConv) and efficient vision transformer (ViT) module. Third, serving as a transitional liaison between the CNN encoder and the transformer encoder, the local-global features interaction (LGFI) module focuses on dispersing local semantic information to facilitate long-distance computations, and the global-local aggregation unit (GLAU) module computes correlations through dot products to restore inductive bias. Fourth, the HOG features are introduced into EsonarNet through the lightweight HOG-deep learning graft mechanism (LHDGM) module to ensure the coherence and compatibility of the acquired traditional and abstract information with different semantics. Ultimately, experimental results demonstrate that EsonarNet outperforms other methods for FLS image segmentation in efficiency.
引用
收藏
页数:17
相关论文
共 69 条
[1]   Unsupervised Local Spatial Mixture Segmentation of Underwater Objects in Sonar Images [J].
Abu, Avi ;
Diamant, Roee .
IEEE JOURNAL OF OCEANIC ENGINEERING, 2019, 44 (04) :1179-1197
[2]   Enhanced Fuzzy-Based Local Information Algorithm for Sonar Image Segmentation [J].
Abu, Avi ;
Diamant, Roee .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :445-460
[3]  
Ba J, 2014, ACS SYM SER
[4]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[5]  
Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]
[6]  
Bolya D., 2022, EUR C COMP VIS, P35
[7]   On-Line Multi-Class Segmentation of Side-Scan Sonar Imagery Using an Autonomous Underwater Vehicle [J].
Burguera, Antoni ;
Bonin-Font, Francisco .
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2020, 8 (08)
[8]   EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction [J].
Cai, Han ;
Li, Junyan ;
Hu, Muyan ;
Gan, Chuang ;
Han, Song .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :17256-17267
[9]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[10]   Research on Obstacle Detection and Avoidance of Autonomous Underwater Vehicle Based on Forward-Looking Sonar [J].
Cao, Xiang ;
Ren, Lu ;
Sun, Changyin .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) :9198-9208