Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

被引:15
|
作者
Chen, Zehui [1 ]
Li, Zhenyu [2 ]
Zhang, Shiquan [3 ]
Fang, Liangji [3 ]
Jiang, Qinhong [3 ]
Zhao, Feng [1 ]
机构
[1] Univ Sci & Tech China, Hefei, Peoples R China
[2] Harbin Inst Technol, Harbin, Peoples R China
[3] SenseTime Res, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
3D object detection; Multi-view Detection; Transformer;
D O I
10.1145/3503161.3547859
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
3D object detection from multiple image views is a fundamental and challenging task for visual scene understanding. However, accurately detecting objects through perspective views in the 3D space is extremely difficult due to the lack of depth information. Recently, DETR3D [50] introduces a novel 3D-2D query paradigm in aggregating multi-view images for 3D object detection and achieves state-of-the-art performance. In this paper, with intensive pilot experiments, we quantify the objects located at different regions and find that the "truncated instances" (i.e., at the border regions of each image) are the main bottleneck hindering the performance of DETR3D. Although it merges multiple features from two adjacent views in the overlapping regions, DETR3D still suffers from insufficient feature aggregation, thus missing the chance to fully boost the detection performance. In an effort to tackle the problem, we propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning. It constructs a dynamic 3D graph between each object query and 2D feature maps to enhance the object representations, especially at the border regions. Besides, Graph-DETR3D benefits from a novel depthinvariant multi-scale training strategy, which maintains the visual depth consistency by simultaneously scaling the image size and the object depth. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of our Graph-DETR3D. Notably, our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
引用
收藏
页码:5999 / 6008
页数:10
相关论文
共 50 条
  • [31] SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
    Tang, Yingqi
    Meng, Zhaotie
    Chen, Guoliang
    Cheng, Erkang
    COMPUTER VISION - ECCV 2024, PT II, 2025, 15060 : 1 - 17
  • [32] BEVFusion With Dual Hard Instance Probing for Multimodal 3D Object Detection
    Kim, Taeho
    Kim, Joohee
    IEEE ACCESS, 2025, 13 : 25546 - 25556
  • [33] Transformer-Based Global PointPillars 3D Object Detection Method
    Zhang, Lin
    Meng, Hua
    Yan, Yunbing
    Xu, Xiaowei
    ELECTRONICS, 2023, 12 (14)
  • [34] Pseudo-Mono for Monocular 3D Object Detection in Autonomous Driving
    Tao, Chongben
    Cao, Jiecheng
    Wang, Chen
    Zhang, Zufeng
    Gao, Zhen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3962 - 3975
  • [35] BEV transformer for visual 3D object detection applied with retentive mechanism
    Pan, Jincheng
    Huang, Xiaoci
    Luo, Suyun
    Ma, Fang
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2025,
  • [36] ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection
    Lu, Chenguang
    Yue, Kang
    Liu, Yue
    COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 262 - 279
  • [37] TBFNT3D: Two-Branch Fusion Network With Transformer for Multimodal Indoor 3D Object Detection
    Cheng, Jun
    Zhang, Sheng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6523 - 6530
  • [38] Corn pose estimation using 3D object detection and stereo images
    Gao, Yuliang
    Li, Zhen
    Hong, Qingqing
    Li, Bin
    Zhang, Lifeng
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 231
  • [39] CT3D++: Improving 3D Object Detection with Keypoint-Induced Channel-wise Transformer
    Sheng, Hualian
    Cai, Sijia
    Zhao, Na
    Deng, Bing
    Liang, Qiao
    Zhao, Min-Jian
    Ye, Jieping
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, : 4817 - 4836
  • [40] Fusion information enhanced method based on transformer for 3D object detection
    Jin Y.
    Tao C.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (12): : 297 - 306