Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images

被引:0
|
作者
Julia Cohen
Carlos Crispim-Junior
Jean-Marc Chiappa
Laure Tougne Rodet
机构
[1] Université de Lyon,
[2] Univ Lyon 2,undefined
[3] CNRS,undefined
[4] Centrale Lyon,undefined
[5] INSA Lyon,undefined
[6] UCBL,undefined
[7] LIRIS,undefined
[8] UMR5205,undefined
[9] DEMS,undefined
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Object detection; Deep learning; Synthetic dataset; Industrial; RGB-D;
D O I
暂无
中图分类号
学科分类号
摘要
Object detection for industrial applications faces challenges that are yet to solve by state-of-the-art deep learning models. They usually lack training data, and the common solution of using a synthetic dataset introduces a domain gap when the model is provided real images. Besides, few architectures fit in the small memory of a mobile device and run in real-time with limited computation capabilities. The models fulfilling these requirements generally have low learning capacity, and the domain gap reduces further the performance. In this work, we propose multiple strategies to reduce the domain gap when using RGB-D images, and to increase the overall performance of a Convolutional Neural Network (CNN) for object detection with a reasonable increase of the model size. First, we propose a new architecture based on the Single Shot Detector (SSD) architecture, and we compare different fusion methods to increase the performance with few or no additional parameters. We applied the proposed method to three synthetic datasets with different visual characteristics, and we show that classical image processing reduces significantly the domain gap for depth maps. Our experiments have shown an improvement when fusing RGB and depth images for two benchmark datasets, even when the depth maps contain few discriminative information. Our RGB-D SSD Lite model performs on par or better than a ResNet-FPN RetinaNet model on the LINEMOD and T-LESS datasets, while requiring 20 times less computation. Finally, we provide some insights on training a robust model for improved performance when one of the modalities is missing.
引用
收藏
页码:12111 / 12138
页数:27
相关论文
共 50 条
  • [31] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622
  • [32] Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel
    Gao, Xin
    Zhang, Guoying
    Xiong, Yijin
    MEASUREMENT, 2022, 194
  • [33] Bridging the View Disparity Between Radar and Camera Features for Multi-Modal Fusion 3D Object Detection
    Zhou, Taohua
    Chen, Junjie
    Shi, Yining
    Jiang, Kun
    Yang, Mengmeng
    Yang, Diange
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (02): : 1523 - 1535
  • [34] Unsupervised Domain Adaptation from Synthetic to Real Images for Anchorless Object Detection
    Scheck, Tobias
    Grassi, Ana Perez
    Hirtz, Gangolf
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 319 - 327
  • [35] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396
  • [36] Multi-Modal System for Walking Safety for the Visually Impaired: Multi-Object Detection and Natural Language Generation
    Lee, Jekyung
    Cha, Kyung-Ae
    Lee, Miran
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [37] Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing
    Schierl, Jonathan
    Graehling, Quinn
    Aspiras, Theus
    Asari, Vijay
    Van Rynbach, Andre
    Rabb, Dave
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [38] MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection
    Jiao, Tianzhe
    Chen, Yuming
    Zhang, Zhe
    Guo, Chaopeng
    Song, Jie
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (03): : 4307 - 4325
  • [39] PV-SSD: A Multi-Modal Point Cloud 3D Object Detector Based on Projection Features and Voxel Features
    Shao, Yongxin
    Tan, Aihong
    Sun, Zhetao
    Zheng, Enhui
    Yan, Tianhong
    Liao, Peng
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (05): : 3436 - 3449
  • [40] LIVER TUMOR DETECTION VIA A MULTI-SCALE INTERMEDIATE MULTI-MODAL FUSION NETWORK ON MRI IMAGES
    Pan, Chao
    Zhou, Peiyun
    Tan, Jingru
    Sun, Baoye
    Guan, Ruoyu
    Wang, Zhutao
    Luo, Ye
    Lu, Jianwei
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 299 - 303