Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images

被引：0

作者：

Julia Cohen

Carlos Crispim-Junior

Jean-Marc Chiappa

Laure Tougne Rodet

机构：

[1] Université de Lyon,

[2] Univ Lyon 2,undefined

[3] CNRS,undefined

[4] Centrale Lyon,undefined

[5] INSA Lyon,undefined

[6] UCBL,undefined

[7] LIRIS,undefined

[8] UMR5205,undefined

[9] DEMS,undefined

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Object detection; Deep learning; Synthetic dataset; Industrial; RGB-D;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Object detection for industrial applications faces challenges that are yet to solve by state-of-the-art deep learning models. They usually lack training data, and the common solution of using a synthetic dataset introduces a domain gap when the model is provided real images. Besides, few architectures fit in the small memory of a mobile device and run in real-time with limited computation capabilities. The models fulfilling these requirements generally have low learning capacity, and the domain gap reduces further the performance. In this work, we propose multiple strategies to reduce the domain gap when using RGB-D images, and to increase the overall performance of a Convolutional Neural Network (CNN) for object detection with a reasonable increase of the model size. First, we propose a new architecture based on the Single Shot Detector (SSD) architecture, and we compare different fusion methods to increase the performance with few or no additional parameters. We applied the proposed method to three synthetic datasets with different visual characteristics, and we show that classical image processing reduces significantly the domain gap for depth maps. Our experiments have shown an improvement when fusing RGB and depth images for two benchmark datasets, even when the depth maps contain few discriminative information. Our RGB-D SSD Lite model performs on par or better than a ResNet-FPN RetinaNet model on the LINEMOD and T-LESS datasets, while requiring 20 times less computation. Finally, we provide some insights on training a robust model for improved performance when one of the modalities is missing.

引用

页码：12111 / 12138

页数：27

共 50 条

[21] M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images
Jiang, Chenchen
Ren, Huazhong
Yang, Hong
Huo, Hongtao
Zhu, Pengfei
Yao, Zhaoyuan
Li, Jing
Sun, Min
Yang, Shihao
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 130
[22] Multi-modal object detection using unsupervised transfer learning and adaptation techniques
Abbott, Rachael
Robertson, Neil
del Rincon, Jesus Martinez
Connor, Barry
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS, 2019, 11169
[23] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Xu, Yifan
Zhang, Mengdan
Yang, Xiaoshan
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
[24] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
Guo, Ruohao
Ying, Xianghua
Qi, Yanyu
Qu, Liao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
[25] Multi-Modal Detection of Man-Made Objects in Simulated Aerial Images
Baran, Matthew S.
Tutwiler, Richard L.
Natale, Donald J.
Bassett, Michael S.
Harner, Matthew P.
ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY XIX, 2013, 8743
[26] Multi-Modal Weights Sharing and Hierarchical Feature Fusion for RGBD Salient Object Detection
Xiao, Fen
Li, Bin
Peng, Yimu
Cao, Chunhong
Hu, Kai
Gao, Xieping
IEEE ACCESS, 2020, 8 : 26602 - 26611
[27] Small Object Detection Technology Using Multi-Modal Data Based on Deep Learning
Park, Chi-Won
Seo, Yuri
Sun, Teh-Jen
Lee, Ga-Won
Huh, Eui-Nam
2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 420 - 422
[28] Cloud and Cloud Shadow Detection for Multi-Modal Imagery With Gap-Filling Applications
Cho, Keunhoo
Park, Seongwook
Seong, Boram
Lee, Seongwhan
Park, Jae-Pil
IEEE ACCESS, 2025, 13 : 7396 - 7406
[29] Object detection based on multi-modal adaptive fusion using YOLOv3
Sheikh, Aarfa Bano
Baru, Apurva
Desai, Sanjana Shinde
Mangale, Supriya
JOURNAL OF APPLIED REMOTE SENSING, 2022, 16 (02)
[30] Multi-modal feature fusion for 3D object detection in the production workshop
Hou, Rui
Chen, Guangzhu
Han, Yinhe
Tang, Zaizuo
Ru, Qingjun
APPLIED SOFT COMPUTING, 2022, 115

← 1 2 3 4 5 →