Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images

被引:0
|
作者
Julia Cohen
Carlos Crispim-Junior
Jean-Marc Chiappa
Laure Tougne Rodet
机构
[1] Université de Lyon,
[2] Univ Lyon 2,undefined
[3] CNRS,undefined
[4] Centrale Lyon,undefined
[5] INSA Lyon,undefined
[6] UCBL,undefined
[7] LIRIS,undefined
[8] UMR5205,undefined
[9] DEMS,undefined
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Object detection; Deep learning; Synthetic dataset; Industrial; RGB-D;
D O I
暂无
中图分类号
学科分类号
摘要
Object detection for industrial applications faces challenges that are yet to solve by state-of-the-art deep learning models. They usually lack training data, and the common solution of using a synthetic dataset introduces a domain gap when the model is provided real images. Besides, few architectures fit in the small memory of a mobile device and run in real-time with limited computation capabilities. The models fulfilling these requirements generally have low learning capacity, and the domain gap reduces further the performance. In this work, we propose multiple strategies to reduce the domain gap when using RGB-D images, and to increase the overall performance of a Convolutional Neural Network (CNN) for object detection with a reasonable increase of the model size. First, we propose a new architecture based on the Single Shot Detector (SSD) architecture, and we compare different fusion methods to increase the performance with few or no additional parameters. We applied the proposed method to three synthetic datasets with different visual characteristics, and we show that classical image processing reduces significantly the domain gap for depth maps. Our experiments have shown an improvement when fusing RGB and depth images for two benchmark datasets, even when the depth maps contain few discriminative information. Our RGB-D SSD Lite model performs on par or better than a ResNet-FPN RetinaNet model on the LINEMOD and T-LESS datasets, while requiring 20 times less computation. Finally, we provide some insights on training a robust model for improved performance when one of the modalities is missing.
引用
收藏
页码:12111 / 12138
页数:27
相关论文
共 50 条
  • [41] Combining Synthetic Images and Deep Active Learning: Data-Efficient Training of an Industrial Object Detection Model
    Eversberg, Leon
    Lambrecht, Jens
    Wang, Guanghui
    JOURNAL OF IMAGING, 2024, 10 (01)
  • [42] A Multi-Modal Fusion-Based 3D Multi-Object Tracking Framework With Joint Detection
    Wang, Xiyang
    Fu, Chunyun
    He, Jiawei
    Huang, Mingguang
    Meng, Ting
    Zhang, Siyu
    Zhou, Hangning
    Xu, Ziyao
    Zhang, Chi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 532 - 539
  • [43] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [44] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ma, Ruixin
    Yin, Yong
    Chen, Jing
    Chang, Rihao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
  • [45] Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models
    Wu, Kewei
    Wang, Yiran
    He, Xiaogang
    Yan, Jinyu
    Guo, Yang
    Jiang, Zhuqing
    Zhang, Xing
    Wang, Wei
    Xiong, Yongping
    Men, Aidong
    Xiao, Li
    Big Data and Cognitive Computing, 2024, 8 (12)
  • [46] Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression
    Dhillon A.
    Verma G.K.
    International Journal of Business Intelligence and Data Mining, 2023, 23 (01) : 73 - 99
  • [47] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    PATTERN RECOGNITION, 2019, 86 : 376 - 385
  • [48] Improving robustness of industrial object detection by automatic generation of synthetic images from CAD models
    Sampaio, Igor Garcia Ballhausen
    Viterbo, Jose
    Guerin, Joris
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (03) : 415 - 432
  • [49] Deep-Learning for Change Detection Using Multi-Modal Fusion of Remote Sensing Images: A Review
    Saidi, Souad
    Idbraim, Soufiane
    Karmoude, Younes
    Masse, Antoine
    Arbelo, Manuel
    REMOTE SENSING, 2024, 16 (20)
  • [50] A Real-time Object Detection Framework for Aerial Imagery Using Deep Neural Networks and Synthetic Training Images
    Narayanan, Priya
    Borel-Donohue, Christoph
    Lee, Hyungtae
    Kwon, Heesung
    Rao, Raghuveer
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXVII, 2018, 10646