Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

被引:15
|
作者
Chen, Zehui [1 ]
Li, Zhenyu [2 ]
Zhang, Shiquan [3 ]
Fang, Liangji [3 ]
Jiang, Qinhong [3 ]
Zhao, Feng [1 ]
机构
[1] Univ Sci & Tech China, Hefei, Peoples R China
[2] Harbin Inst Technol, Harbin, Peoples R China
[3] SenseTime Res, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
3D object detection; Multi-view Detection; Transformer;
D O I
10.1145/3503161.3547859
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
3D object detection from multiple image views is a fundamental and challenging task for visual scene understanding. However, accurately detecting objects through perspective views in the 3D space is extremely difficult due to the lack of depth information. Recently, DETR3D [50] introduces a novel 3D-2D query paradigm in aggregating multi-view images for 3D object detection and achieves state-of-the-art performance. In this paper, with intensive pilot experiments, we quantify the objects located at different regions and find that the "truncated instances" (i.e., at the border regions of each image) are the main bottleneck hindering the performance of DETR3D. Although it merges multiple features from two adjacent views in the overlapping regions, DETR3D still suffers from insufficient feature aggregation, thus missing the chance to fully boost the detection performance. In an effort to tackle the problem, we propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning. It constructs a dynamic 3D graph between each object query and 2D feature maps to enhance the object representations, especially at the border regions. Besides, Graph-DETR3D benefits from a novel depthinvariant multi-scale training strategy, which maintains the visual depth consistency by simultaneously scaling the image size and the object depth. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of our Graph-DETR3D. Notably, our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.
引用
收藏
页码:5999 / 6008
页数:10
相关论文
共 50 条
  • [41] HRNet: 3D object detection network for point cloud with hierarchical refinement
    Lu, Bin
    Sun, Yang
    Yang, Zhenyu
    Song, Ran
    Jiang, Haiyan
    Liu, Yonghuai
    PATTERN RECOGNITION, 2024, 149
  • [42] SA-MVSNet: Self-attention-based multi-view stereo network for 3D reconstruction of images with weak texture
    Yang, Ronghao
    Miao, Wang
    Zhang, Zhenxin
    Liu, Zhenlong
    Li, Mubai
    Lin, Bin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [43] Multi-hop graph transformer network for 3D human pose estimation
    Islam, Zaedul
    Ben Hamza, A.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
  • [44] AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer
    Zhang, Yan
    Liu, Kang
    Bao, Hong
    Qian, Xu
    Wang, Zihan
    Ye, Shiqing
    Wang, Weicen
    SENSORS, 2023, 23 (20)
  • [45] KPTr: Key point transformer for LiDAR-based 3D object detection
    Cao, Jie
    Peng, Yiqiang
    Wei, Hongqian
    Mo, Lingfan
    Fan, Likang
    Wang, Longfei
    MEASUREMENT, 2025, 242
  • [46] Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
    Song, Ziying
    Liu, Lin
    Jia, Feiyang
    Luo, Yadan
    Jia, Caiyan
    Zhang, Guoxin
    Yang, Lei
    Wang, Li
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 15407 - 15436
  • [47] DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection
    Erabati, Gopi Krishna
    Araujo, Helder
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
  • [48] Two-stage 3D object detection guided by position encoding q
    Xu, Wanpeng
    Zou, Ling
    Fu, Zhipeng
    Wu, Lingda
    Qi, Yue
    NEUROCOMPUTING, 2022, 501 : 811 - 821
  • [49] Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out
    Yang, Dongsheng
    Fan, Xiaojie
    Dong, Wei
    Huang, Chaosheng
    Li, Jun
    SENSORS, 2024, 24 (14)
  • [50] URFormer: Unified Representation LiDAR-Camera 3D Object Detection with Transformer
    Zhang, Guoxin
    Xie, Jun
    Liu, Lin
    Wang, Zhepeng
    Yang, Kuihe
    Song, Ziying
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 401 - 413