Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images

被引:0
|
作者
Julia Cohen
Carlos Crispim-Junior
Jean-Marc Chiappa
Laure Tougne Rodet
机构
[1] Université de Lyon,
[2] Univ Lyon 2,undefined
[3] CNRS,undefined
[4] Centrale Lyon,undefined
[5] INSA Lyon,undefined
[6] UCBL,undefined
[7] LIRIS,undefined
[8] UMR5205,undefined
[9] DEMS,undefined
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Object detection; Deep learning; Synthetic dataset; Industrial; RGB-D;
D O I
暂无
中图分类号
学科分类号
摘要
Object detection for industrial applications faces challenges that are yet to solve by state-of-the-art deep learning models. They usually lack training data, and the common solution of using a synthetic dataset introduces a domain gap when the model is provided real images. Besides, few architectures fit in the small memory of a mobile device and run in real-time with limited computation capabilities. The models fulfilling these requirements generally have low learning capacity, and the domain gap reduces further the performance. In this work, we propose multiple strategies to reduce the domain gap when using RGB-D images, and to increase the overall performance of a Convolutional Neural Network (CNN) for object detection with a reasonable increase of the model size. First, we propose a new architecture based on the Single Shot Detector (SSD) architecture, and we compare different fusion methods to increase the performance with few or no additional parameters. We applied the proposed method to three synthetic datasets with different visual characteristics, and we show that classical image processing reduces significantly the domain gap for depth maps. Our experiments have shown an improvement when fusing RGB and depth images for two benchmark datasets, even when the depth maps contain few discriminative information. Our RGB-D SSD Lite model performs on par or better than a ResNet-FPN RetinaNet model on the LINEMOD and T-LESS datasets, while requiring 20 times less computation. Finally, we provide some insights on training a robust model for improved performance when one of the modalities is missing.
引用
收藏
页码:12111 / 12138
页数:27
相关论文
共 50 条
  • [1] Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images
    Cohen, Julia
    Crispim-Junior, Carlos
    Chiappa, Jean-Marc
    Rodet, Laure Tougne
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12111 - 12138
  • [2] Object detection in multi-modal images using genetic programming
    Bhanu, B
    Lin, YQ
    APPLIED SOFT COMPUTING, 2004, 4 (02) : 175 - 201
  • [3] Deep Multi-modal Object Detection for Autonomous Driving
    Ennajar, Amal
    Khouja, Nadia
    Boutteau, Remi
    Tlili, Fethi
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
  • [4] Multi-Modal Prototypes for Few-Shot Object Detection in Remote Sensing Images
    Liu, Yanxing
    Pan, Zongxu
    Yang, Jianwei
    Zhou, Peiling
    Zhang, Bingchen
    REMOTE SENSING, 2024, 16 (24)
  • [5] Multi-modal object detection via transformer network
    Liu, Wenbing
    Wang, Haibo
    Gao, Quanxue
    Zhu, Zhaorui
    IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550
  • [6] Multi-Modal Dataset Generation using Domain Randomization for Object Detection
    Marez, Diego
    Nans, Lena
    Borden, Samuel
    GEOSPATIAL INFORMATICS XI, 2021, 11733
  • [7] Closing the domain gap: blended synthetic imagery for climate object detection
    Kornfein, Caleb
    Willard, Frank
    Tang, Caroline
    Long, Yuxi
    Jain, Saksham
    Malof, Jordan
    Ren, Simiao
    Bradbury, Kyle
    ENVIRONMENTAL DATA SCIENCE, 2023, 2
  • [8] Deep learning based object detection from multi-modal sensors: an overview
    Ye Liu
    Shiyang Meng
    Hongzhang Wang
    Jun Liu
    Multimedia Tools and Applications, 2024, 83 : 19841 - 19870
  • [9] Deep learning based object detection from multi-modal sensors: an overview
    Liu, Ye
    Meng, Shiyang
    Wang, Hongzhang
    Liu, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19841 - 19870
  • [10] Class-Agnostic Object Detection with Multi-modal Transformer
    Maaz, Muhammad
    Rasheed, Hanoona
    Khan, Salman
    Khan, Fahad Shahbaz
    Anwer, Rao Muhammad
    Yang, Ming-Hsuan
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531