UAPT: an underwater acoustic target recognition method based on pre-trained Transformer

被引：1

作者：

Tang, Jun ^{[1
]}

Ma, Enxue ^{[1
]}

Qu, Yang ^{[1
]}

Gao, Wenbo ^{[1
]}

Zhang, Yuchen ^{[1
]}

Gan, Lin ^{[2
]}

机构：

[1] Tianjin Univ, Sch Civil Engn, Tianjin 300072, Peoples R China

[2] Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2025年 / 31卷 / 01期

关键词：

Underwater acoustic target recognition; Transformer; Transfer learning; Deep learning; Pre-train; MODEL;

D O I：

10.1007/s00530-024-01614-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Convolutional Neural Network (CNN) model in underwater acoustic target recognition (UATR) research reveals limitations arising from its inability to capture long-distance dependencies, impeding its capacity to focus on global information within the underwater acoustic signal. In contrast, the Transformer model has progressively emerged as the optimal choice in various studies, owing to its exclusive dependence on the attention mechanism for extracting global features from input data. Limited research utilizing the Transformer model in UATR has relied on an early ViT model, while in this paper, two refined Transformer models, namely Swin Transformer and Biformer, are adopted as the foundational networks, and a novel Swin Biformer model is proposed by harnessing the strengths of the two. Experimental results demonstrate the consistent superiority of the three models over CNN and ViT in UATR, and the Swin Biformer model remarkably attains the highest recognition accuracy of 94.3% evaluated on a dataset constructed from the Deepship database. At the same time, this paper proposes a UATR method based on pre-trained Transformer, the effectiveness of which is underscored by experimental findings as a recognition accuracy of approximately 97% was achieved on a generalized dataset derived from the Shipsear database. Even with limited data samples and more stringent classification requirements, the method maintains a recognition accuracy of over 90%, all while significantly reducing the training duration.

引用

页数：15

共 44 条

[1] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[2] A model of inductive bias learning
Baxter, J
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 12 : 149 - 198
[3] Bryan B., 2024, SPIE, V13206, P450
[4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[5] Generative adversarial learning for improved data efficiency in underwater target classification
Chandran, Satheesh C.
Kamal, Suraj
Mujeeb, A.
Supriya, M. H.
[J]. ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2022, 30
[6] DEVELOPING REAL-TIME STREAMING TRANSFORMER TRANSDUCER FOR SPEECH RECOGNITION ON LARGE-SCALE DATASET
Chen, Xie
Wu, Yu
Wang, Zhenghao
Liu, Shujie
Li, Jinyu
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5904 - 5908
[7] Child R, 2019, Arxiv, DOI arXiv:1904.10509
[8] Xception: Deep Learning with Depthwise Separable Convolutions
Chollet, Francois
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
[9] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
[10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

← 1 2 3 4 5 →