Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images

被引:0
|
作者
Julia Cohen
Carlos Crispim-Junior
Jean-Marc Chiappa
Laure Tougne Rodet
机构
[1] Université de Lyon,
[2] Univ Lyon 2,undefined
[3] CNRS,undefined
[4] Centrale Lyon,undefined
[5] INSA Lyon,undefined
[6] UCBL,undefined
[7] LIRIS,undefined
[8] UMR5205,undefined
[9] DEMS,undefined
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Object detection; Deep learning; Synthetic dataset; Industrial; RGB-D;
D O I
暂无
中图分类号
学科分类号
摘要
Object detection for industrial applications faces challenges that are yet to solve by state-of-the-art deep learning models. They usually lack training data, and the common solution of using a synthetic dataset introduces a domain gap when the model is provided real images. Besides, few architectures fit in the small memory of a mobile device and run in real-time with limited computation capabilities. The models fulfilling these requirements generally have low learning capacity, and the domain gap reduces further the performance. In this work, we propose multiple strategies to reduce the domain gap when using RGB-D images, and to increase the overall performance of a Convolutional Neural Network (CNN) for object detection with a reasonable increase of the model size. First, we propose a new architecture based on the Single Shot Detector (SSD) architecture, and we compare different fusion methods to increase the performance with few or no additional parameters. We applied the proposed method to three synthetic datasets with different visual characteristics, and we show that classical image processing reduces significantly the domain gap for depth maps. Our experiments have shown an improvement when fusing RGB and depth images for two benchmark datasets, even when the depth maps contain few discriminative information. Our RGB-D SSD Lite model performs on par or better than a ResNet-FPN RetinaNet model on the LINEMOD and T-LESS datasets, while requiring 20 times less computation. Finally, we provide some insights on training a robust model for improved performance when one of the modalities is missing.
引用
收藏
页码:12111 / 12138
页数:27
相关论文
共 50 条
  • [21] M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images
    Jiang, Chenchen
    Ren, Huazhong
    Yang, Hong
    Huo, Hongtao
    Zhu, Pengfei
    Yao, Zhaoyuan
    Li, Jing
    Sun, Min
    Yang, Shihao
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 130
  • [22] Multi-modal object detection using unsupervised transfer learning and adaptation techniques
    Abbott, Rachael
    Robertson, Neil
    del Rincon, Jesus Martinez
    Connor, Barry
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS, 2019, 11169
  • [23] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
    Xu, Yifan
    Zhang, Mengdan
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
  • [24] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
    Guo, Ruohao
    Ying, Xianghua
    Qi, Yanyu
    Qu, Liao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
  • [25] Multi-Modal Detection of Man-Made Objects in Simulated Aerial Images
    Baran, Matthew S.
    Tutwiler, Richard L.
    Natale, Donald J.
    Bassett, Michael S.
    Harner, Matthew P.
    ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY XIX, 2013, 8743
  • [26] Multi-Modal Weights Sharing and Hierarchical Feature Fusion for RGBD Salient Object Detection
    Xiao, Fen
    Li, Bin
    Peng, Yimu
    Cao, Chunhong
    Hu, Kai
    Gao, Xieping
    IEEE ACCESS, 2020, 8 : 26602 - 26611
  • [27] Small Object Detection Technology Using Multi-Modal Data Based on Deep Learning
    Park, Chi-Won
    Seo, Yuri
    Sun, Teh-Jen
    Lee, Ga-Won
    Huh, Eui-Nam
    2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 420 - 422
  • [28] Cloud and Cloud Shadow Detection for Multi-Modal Imagery With Gap-Filling Applications
    Cho, Keunhoo
    Park, Seongwook
    Seong, Boram
    Lee, Seongwhan
    Park, Jae-Pil
    IEEE ACCESS, 2025, 13 : 7396 - 7406
  • [29] Object detection based on multi-modal adaptive fusion using YOLOv3
    Sheikh, Aarfa Bano
    Baru, Apurva
    Desai, Sanjana Shinde
    Mangale, Supriya
    JOURNAL OF APPLIED REMOTE SENSING, 2022, 16 (02)
  • [30] Multi-modal feature fusion for 3D object detection in the production workshop
    Hou, Rui
    Chen, Guangzhu
    Han, Yinhe
    Tang, Zaizuo
    Ru, Qingjun
    APPLIED SOFT COMPUTING, 2022, 115