CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

被引:1
|
作者
Tseng, Ching-Yu [1 ]
Chen, Yi-Rong [1 ]
Lee, Hsin-Ying [1 ]
Wu, Tsung-Han [1 ]
Chen, Wen-Chin [1 ]
Hsu, Winston H. [1 ,2 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
[2] Mobile Drive Technol, Amsterdam, Netherlands
关键词
D O I
10.1109/ICRA48891.2023.10161451
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra depth datasets during supervision. Second, a cross-view depth-guided transformer is developed to fuse the depth embeddings as well as image features from cameras of different views and generate 3D bounding boxes. Extensive experiments demonstrated that our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics. Also, computational analyses showed that our method is 5 times faster than prior approaches. Our codes will be made publicly available at https://github.com/sty61010/CrossDTR.
引用
收藏
页码:4850 / 4857
页数:8
相关论文
共 50 条
  • [31] Real-Time Multimodal 3D Object Detection with Transformers
    Liu, Hengsong
    Duan, Tongle
    WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
  • [32] Group-Free 3D Object Detection via Transformers
    Liu, Ze
    Zhang, Zheng
    Cao, Yue
    Hu, Han
    Tong, Xin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2929 - 2938
  • [33] Sliding Shapes for 3D Object Detection in Depth Images
    Song, Shuran
    Xiao, Jianxiong
    COMPUTER VISION - ECCV 2014, PT VI, 2014, 8694 : 634 - 651
  • [34] Object Detection and Depth Estimation for 3D Trajectory Extraction
    Boukhers, Zeyd
    Shirahama, Kimiaki
    Li, Frederic
    Grzegorzek, Marcin
    2015 13TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2015,
  • [35] BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo
    Li, Yinhao
    Bao, Han
    Ge, Zheng
    Yang, Jinrong
    Sun, Jianjian
    Li, Zeming
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1486 - 1494
  • [36] Monocular 3D Object Detection with Depth from Motion
    Wang, Tai
    Pang, Jiangmiao
    Lin, Dahua
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 386 - 403
  • [37] CVFNet: Real-time 3D Object Detection by Learning Cross View Features
    Gu, Jiaqi
    Xiang, Zhiyu
    Zhao, Pan
    Bai, Tingming
    Wang, Lingxuan
    Zhao, Xijun
    Zhang, Zhiyuan
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 568 - 574
  • [38] Cross-Modality 3D Object Detection
    Zhu, Ming
    Ma, Chao
    Ji, Pan
    Yang, Xiaokang
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3771 - 3780
  • [39] 3D Human Action Representation Learning via Cross-View Consistency Pursuit
    Li, Linguo
    Wang, Minsi
    Ni, Bingbing
    Wang, Hang
    Yang, Jiancheng
    Zhang, Wenjun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4739 - 4748
  • [40] Structure Guided Proposal Completion for 3D Object Detection
    Shi, Chao
    Zhang, Chongyang
    Luo, Yan
    COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 504 - 520