Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

被引:15
|
作者
Chen, Zehui [1 ]
Li, Zhenyu [2 ]
Zhang, Shiquan [3 ]
Fang, Liangji [3 ]
Jiang, Qinhong [3 ]
Zhao, Feng [1 ]
机构
[1] Univ Sci & Tech China, Hefei, Peoples R China
[2] Harbin Inst Technol, Harbin, Peoples R China
[3] SenseTime Res, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
3D object detection; Multi-view Detection; Transformer;
D O I
10.1145/3503161.3547859
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
3D object detection from multiple image views is a fundamental and challenging task for visual scene understanding. However, accurately detecting objects through perspective views in the 3D space is extremely difficult due to the lack of depth information. Recently, DETR3D [50] introduces a novel 3D-2D query paradigm in aggregating multi-view images for 3D object detection and achieves state-of-the-art performance. In this paper, with intensive pilot experiments, we quantify the objects located at different regions and find that the "truncated instances" (i.e., at the border regions of each image) are the main bottleneck hindering the performance of DETR3D. Although it merges multiple features from two adjacent views in the overlapping regions, DETR3D still suffers from insufficient feature aggregation, thus missing the chance to fully boost the detection performance. In an effort to tackle the problem, we propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning. It constructs a dynamic 3D graph between each object query and 2D feature maps to enhance the object representations, especially at the border regions. Besides, Graph-DETR3D benefits from a novel depthinvariant multi-scale training strategy, which maintains the visual depth consistency by simultaneously scaling the image size and the object depth. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of our Graph-DETR3D. Notably, our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
引用
收藏
页码:5999 / 6008
页数:10
相关论文
共 50 条
  • [21] Improved 3D Object Detection Based on PointPillars
    Kong, Weiwei
    Du, Yusheng
    He, Leilei
    Li, Zejiang
    ELECTRONICS, 2024, 13 (15)
  • [22] Transformer3D-Det: Improving 3D Object Detection by Vote Refinement
    Zhao, Lichen
    Guo, Jinyang
    Xu, Dong
    Sheng, Lu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) : 4735 - 4746
  • [23] Multimodal Transformer for Automatic 3D Annotation and Object Detection
    Liu, Chang
    Qian, Xiaoyan
    Huang, Binxiao
    Qi, Xiaojuan
    Lam, Edmund
    Tan, Siew-Chong
    Wong, Ngai
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 657 - 673
  • [24] 2D and 3D object detection algorithms from images: A Survey
    Chen, Wei
    Li, Yan
    Tian, Zijian
    Zhang, Fan
    ARRAY, 2023, 19
  • [25] Real-Time Multimodal 3D Object Detection with Transformers
    Liu, Hengsong
    Duan, Tongle
    WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
  • [26] 3D point cloud object detection algorithm based on Transformer
    Liu M.
    Yang Q.
    Hu G.
    Guo Y.
    Zhang J.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2023, 41 (06): : 1190 - 1197
  • [27] ESMformer: Error-aware self-supervised transformer for multi-view 3D human pose estimation
    Zhang, Lijun
    Zhou, Kangkang
    Lu, Feng
    Li, Zhenghao
    Shao, Xiaohu
    Zhou, Xiang-Dong
    Shi, Yu
    PATTERN RECOGNITION, 2025, 158
  • [28] CenterFormer: Center-Based Transformer for 3D Object Detection
    Zhou, Zixiang
    Zhao, Xiangchen
    Wang, Yu
    Wang, Panqu
    Foroosh, Hassan
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 496 - 513
  • [29] MCHFormer: A Multi-Cross Hybrid Former of Point-Image for 3D Object Detection
    Cao, Feng
    Xue, Jun
    Tao, Chongben
    Luo, Xizhao
    Gao, Zhen
    Zhang, Zufeng
    Zheng, Sifa
    Zhu, Yuan
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 383 - 394
  • [30] Image attention transformer network for indoor 3D object detection
    Ren, Keyan
    Yan, Tong
    Hu, Zhaoxin
    Han, Honggui
    Zhang, Yunlu
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2024, 67 (07) : 2176 - 2190