Self-supervised learning with multimodal remote sensed maps for seafloor visual class inference

被引：0

作者：

Liang, Cailei ^{[1
]}

Cappelletto, Jose ^{[1
]}

Massot-Campos, Miquel ^{[1
]}

Bodenmann, Adrian ^{[1
]}

Huvenne, Veerle A., I ^{[2
]}

Wardell, Catherine ^{[2
]}

Bett, Brian J. ^{[2
]}

Newborough, Darryl ^{[3
]}

Thornton, Blair ^{[1
,4
]}

机构：

[1] Univ Southampton, Ctr Situ & Remote Intelligent Sensing, Southampton, Hants, England

[2] Natl Oceanog Ctr, Ocean BioGeosci, Southampton, England

[3] Sonardyne Int Ltd, Yateley, England

[4] Univ Tokyo, Inst Ind Sci, Tokyo, Japan

来源：

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH | 2025年

关键词：

Multimodal feature learning; location-based regularisation; self-supervision; seafloor mapping; habitat classification; NEURAL-NETWORKS; CLASSIFICATION; IMAGE; SCALE; RECOGNITION; FEATURES; GROWTH; SMOTE; BAY;

D O I：

10.1177/02783649251343640

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Seafloor surveys often gather multiple modes of remote sensed mapping and sampling data to infer kilo- to mega-hectare scale seafloor habitat distributions. However, efforts to extract information from multimodal data are complicated by inconsistencies between measurement modes (e.g., resolution, positional offsets, geometric distortions) and different acquisition periods for dynamically changing environments. In this study, we investigate the use of location information during multimodal feature learning and its impact on habitat classification. Experiments on multimodal datasets gathered from three Marine Protected Areas (MPAs) showed improved robustness and performance when using location-based regularisation terms compared to equivalent autoencoder-based and contrastive self-supervised feature learners. Location-guiding improved F1 scores by 7.7% for autoencoder-based and 28.8% for contrastive feature learners averaged across 78 experiments on datasets spanning three distinct sites and 18 data modes. Location-guiding enhances performance when combining multimodal data, increasing F1 scores by an average of 8.8% and 37.8% compared to the best-performing individual mode being combined for autoencoder-based and contrastive self-supervised models, respectively. Performance gains are maintained over a large range of location-guiding distance hyperparameters, where improvements of 5.3% and 29.4% are achieved on average over an order-of-magnitude range of hyperparameters for the autoencoder and contrastive learners, respectively, both comparing favourably with optimally tuned conditions. Location-guiding also exhibits robustness to position inconsistencies between combined data modes, still achieving an average of 3.0% and 30.4% increase in performance compared to equivalent feature learners without location regularisation when position offsets of up to 10 m are artificially introduced to the remote sensed data. Our results show that the classifier used to delineate the learned feature spaces has less impact on performance than the feature learner, with probabilistic classifiers averaging 3.4% higher F1 scores than non-probabilistic classifiers.

引用

页数：25

共 86 条

[1] Speeded-Up Robust Features (SURF) [J].

Bay, Herbert ;

Ess, Andreas ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359

[2] Monitoring mosaic biotopes in a marine conservation zone by autonomous underwater vehicle [J].

Benoist, Noelie M. A. ;

Morris, Kirsty J. ;

Sett, Brian J. ;

Durden, Jennifer M. ;

Huvenne, Veerle A., I ;

Le Sas, Tim P. ;

Wynn, Russell B. ;

Ware, Suzanne J. ;

Ruhl, Henry A. .

CONSERVATION BIOLOGY, 2019, 33 (05) :1174-1186

[3] Understanding Robustness of Transformers for Image Classification [J].

Bhojanapalli, Srinadh ;

Chakrabarti, Ayan ;

Glasner, Daniel ;

Li, Daliang ;

Unterthiner, Thomas ;

Veit, Andreas .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :10211-10221

[4]

Bijjahalli S, 2023, Arxiv, DOI arXiv:2306.04834

[5] Generation of High-resolution Three-dimensional Reconstructions of the Seafloor in Color using a Single Camera and Structured Light [J].

Bodenmann, Adrian ;

Thornton, Blair ;

Ura, Tamaki .

JOURNAL OF FIELD ROBOTICS, 2017, 34 (05) :833-851

[6] Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition [J].

Boulahia, Said Yacine ;

Amamra, Abdenour ;

Madi, Mohamed Ridha ;

Daikh, Said .

MACHINE VISION AND APPLICATIONS, 2021, 32 (06)

[7] Benthic habitat mapping: A review of progress towards improved understanding of the spatial ecology of the seafloor using acoustic techniques [J].

Brown, Craig J. ;

Smith, Stephen J. ;

Lawton, Peter ;

Anderson, John T. .

ESTUARINE COASTAL AND SHELF SCIENCE, 2011, 92 (03) :502-520

[8]

Chaganti S.Y., 2020, 2020 INT C COMP SCI, P1

[9]

Chaudhari NS, 2004, STUD FUZZ SOFT COMP, V152, P211

[10] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

← 1 2 3 4 5 6 7 8 9 →