A convolutional vision transformer for semantic segmentation of side-scan sonar data

被引：8

作者：

Rajani, Hayat ^{[1
]}

Gracias, Nuno ^{[1
]}

Garcia, Rafael ^{[1
]}

机构：

[1] Univ Girona, Comp Vis & Robot Res Inst ViCOROB, Campus Montilivi,Edifici P4, Girona 17003, Catalonia, Spain

来源：

OCEAN ENGINEERING | 2023年 / 286卷

基金：

欧盟地平线“2020”;

关键词：

Seafloor segmentation; Side-scan sonar; Vision transformer; Convolutional transformer; Real-time;

D O I：

10.1016/j.oceaneng.2023.115647

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel feature extraction module to replace the Multi-layer Perceptron (MLP) block within transformer layers and a novel module to extract multiscale patch embeddings. A lightweight decoder is also proposed to complement this design in order to further enhance multiscale feature extraction. With the modified architecture, we achieve state-of-the-art results and also meet real-time computational requirements. We make our code available at https://github.com/hayatrajani/s3seg-vit.

引用

页数：12

共 47 条

[1] Koohpayegani SA, 2022, Arxiv, DOI arXiv:2206.08898
[2] Ba J.L., 2016, arXiv
[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[4] Bousselham W, 2022, Arxiv, DOI arXiv:2111.13280
[5] On-Line Multi-Class Segmentation of Side-Scan Sonar Imagery Using an Autonomous Underwater Vehicle
Burguera, Antoni
Bonin-Font, Francisco
[J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2020, 8 (08)
[6] Cao H., 2021, arXiv
[7] A Novel Method for Sidescan Sonar Image Segmentation
Celik, Turgay
Tjahjadi, Tardi
[J]. IEEE JOURNAL OF OCEANIC ENGINEERING, 2011, 36 (02) : 186 - 194
[8] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[9] ConViT: improving vision transformers with soft convolutional inductive biases
d'Ascoli, Stephane
Touvron, Hugo
Leavitt, Matthew L.
Morcos, Ari S.
Biroli, Giulio
Sagun, Levent
[J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11):
[10] CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Dong, Xiaoyi
Bao, Jianmin
Chen, Dongdong
Zhang, Weiming
Yu, Nenghai
Yuan, Lu
Chen, Dong
Guo, Baining
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12114 - 12124

← 1 2 3 4 5 →