A convolutional vision transformer for semantic segmentation of side-scan sonar data

被引:8
作者
Rajani, Hayat [1 ]
Gracias, Nuno [1 ]
Garcia, Rafael [1 ]
机构
[1] Univ Girona, Comp Vis & Robot Res Inst ViCOROB, Campus Montilivi,Edifici P4, Girona 17003, Catalonia, Spain
基金
欧盟地平线“2020”;
关键词
Seafloor segmentation; Side-scan sonar; Vision transformer; Convolutional transformer; Real-time;
D O I
10.1016/j.oceaneng.2023.115647
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel feature extraction module to replace the Multi-layer Perceptron (MLP) block within transformer layers and a novel module to extract multiscale patch embeddings. A lightweight decoder is also proposed to complement this design in order to further enhance multiscale feature extraction. With the modified architecture, we achieve state-of-the-art results and also meet real-time computational requirements. We make our code available at https://github.com/hayatrajani/s3seg-vit.
引用
收藏
页数:12
相关论文
共 47 条
  • [1] Koohpayegani SA, 2022, Arxiv, DOI arXiv:2206.08898
  • [2] Ba J.L., 2016, arXiv
  • [3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [4] Bousselham W, 2022, Arxiv, DOI arXiv:2111.13280
  • [5] On-Line Multi-Class Segmentation of Side-Scan Sonar Imagery Using an Autonomous Underwater Vehicle
    Burguera, Antoni
    Bonin-Font, Francisco
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2020, 8 (08)
  • [6] Cao H., 2021, arXiv
  • [7] A Novel Method for Sidescan Sonar Image Segmentation
    Celik, Turgay
    Tjahjadi, Tardi
    [J]. IEEE JOURNAL OF OCEANIC ENGINEERING, 2011, 36 (02) : 186 - 194
  • [8] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [9] ConViT: improving vision transformers with soft convolutional inductive biases
    d'Ascoli, Stephane
    Touvron, Hugo
    Leavitt, Matthew L.
    Morcos, Ari S.
    Biroli, Giulio
    Sagun, Levent
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11):
  • [10] CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
    Dong, Xiaoyi
    Bao, Jianmin
    Chen, Dongdong
    Zhang, Weiming
    Yu, Nenghai
    Yuan, Lu
    Chen, Dong
    Guo, Baining
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12114 - 12124