DDet3D: embracing 3D object detector with diffusion

被引:0
作者
Erabati, Gopi Krishna [1 ]
Araujo, Helder [1 ]
机构
[1] Univ Coimbra, Inst Syst & Robot, Rua Silvio Lima,Polo 2, P-3030290 Coimbra, Portugal
关键词
3D object detection; Diffusion; LiDAR; Autonomous driving; Computer vision;
D O I
10.1007/s10489-024-06045-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing approaches rely on heuristic or learnable object proposals (which are required to be optimised during training) for 3D object detection. In our approach, we replace the hand-crafted or learnable object proposals with randomly generated object proposals by formulating a new paradigm to employ a diffusion model to detect 3D objects from a set of randomly generated and supervised learning-based object proposals in an autonomous driving application. We propose DDet3D, a diffusion-based 3D object detection framework that formulates 3D object detection as a generative task over the 3D bounding box coordinates in 3D space. To our knowledge, this work is the first to formulate the 3D object detection with denoising diffusion model and to establish that 3D randomly generated and supervised learning-based proposals (different from empirical anchors or learnt queries) are also potential object candidates for 3D object detection. During training, the 3D random noisy boxes are employed from the 3D ground truth boxes by progressively adding Gaussian noise, and the DDet3D network is trained to reverse the diffusion process. During the inference stage, the DDet3D network is able to iteratively refine the 3D randomly generated and supervised learning-based noisy boxes to predict 3D bounding boxes conditioned on the LiDAR Bird's Eye View (BEV) features. The advantage of DDet3D is that it allows to decouple training and inference stages, thus enabling the use of a larger number of proposal boxes or sampling steps during inference to improve accuracy. We conduct extensive experiments and analysis on the nuScenes and KITTI datasets. DDet3D achieves competitive performance compared to well-designed 3D object detectors. Our work serves as a strong baseline to explore and employ more efficient diffusion models for 3D perception tasks.
引用
收藏
页数:16
相关论文
共 75 条
  • [1] Austin J, 2021, ADV NEUR IN
  • [2] Baranchuk D., 2022, INT C LEARN REPR
  • [3] Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
    Bond-Taylor, Sam
    Hessey, Peter
    Sasaki, Hiroshi
    Breckon, Toby P.
    Willcocks, Chris G.
    [J]. COMPUTER VISION, ECCV 2022, PT XXIII, 2022, 13683 : 170 - 188
  • [4] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Chen Q., 2021, P ADV NEUR INF PROC, V34, P26871
  • [7] Chen Q., 2020, Adv Neural Inf Process Syst, P21224
  • [8] DiffusionDet: Diffusion Model for Object Detection
    Chen, Shoufa
    Sun, Peize
    Song, Yibing
    Luo, Ping
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19773 - 19786
  • [9] Chen T., 2023, 11 INT C LEARN REPR
  • [10] A Generalist Framework for Panoptic Segmentation of Images and Videos
    Chen, Ting
    Li, Lala
    Saxena, Saurabh
    Hinton, Geoffrey
    Fleet, David J.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 909 - 919