Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

被引：20

作者：

Chen, Zehui ^{[1
]}

Li, Zhenyu ^{[2
]}

Zhang, Shiquan ^{[3
]}

Fang, Liangji ^{[3
]}

Jiang, Qinhong ^{[3
]}

Zhao, Feng ^{[1
]}

机构：

[1] Univ Sci & Tech China, Hefei, Peoples R China

[2] Harbin Inst Technol, Harbin, Peoples R China

[3] SenseTime Res, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

3D object detection; Multi-view Detection; Transformer;

D O I：

10.1145/3503161.3547859

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

3D object detection from multiple image views is a fundamental and challenging task for visual scene understanding. However, accurately detecting objects through perspective views in the 3D space is extremely difficult due to the lack of depth information. Recently, DETR3D [50] introduces a novel 3D-2D query paradigm in aggregating multi-view images for 3D object detection and achieves state-of-the-art performance. In this paper, with intensive pilot experiments, we quantify the objects located at different regions and find that the "truncated instances" (i.e., at the border regions of each image) are the main bottleneck hindering the performance of DETR3D. Although it merges multiple features from two adjacent views in the overlapping regions, DETR3D still suffers from insufficient feature aggregation, thus missing the chance to fully boost the detection performance. In an effort to tackle the problem, we propose Graph-DETR3D to automatically aggregate multi-view imagery information through graph structure learning. It constructs a dynamic 3D graph between each object query and 2D feature maps to enhance the object representations, especially at the border regions. Besides, Graph-DETR3D benefits from a novel depthinvariant multi-scale training strategy, which maintains the visual depth consistency by simultaneously scaling the image size and the object depth. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and efficiency of our Graph-DETR3D. Notably, our best model achieves 49.5 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various published image-view 3D object detectors.

引用

页码：5999 / 6008

页数：10

共 57 条

[1]

[Anonymous], 2021, ADV NEUR IN

[2]

[Anonymous], 2012, Computational geometry: an introduction

[3] Recent advances in augmented reality [J].

Azuma, R ;

Baillot, Y ;

Behringer, R ;

Feiner, S ;

Julier, S ;

MacIntyre, B .

IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2001, 21 (06) :34-47

[4] A survey of augmented reality [J].

Azuma, RT .

PRESENCE-VIRTUAL AND AUGMENTED REALITY, 1997, 6 (04) :355-385

[5] COMPLEXITY OF FINDING FIXED-RADIUS NEAR NEIGHBORS [J].

BENTLEY, JL ;

STANAT, DF ;

WILLIAMS, EH .

INFORMATION PROCESSING LETTERS, 1977, 6 (06) :209-212

[6]

Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164

[7]

Cai H, 2018, INT C LEARN REPR

[8]

Carion N, 2020, Img Proc Comp Vis Re, V12346, P213, DOI 10.1007/978-3-030-58452-8_13

[9] Augmented reality technologies, systems and applications [J].

Carmigniani, Julie ;

Furht, Borko ;

Anisetti, Marco ;

Ceravolo, Paolo ;

Damiani, Ernesto ;

Ivkovic, Misa .

MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 51 (01) :341-377

[10] Efficacy and Safety of Long-Term Low-Dose Clarithromycin in Patients With Refractory Chronic Sinusitis After Endoscopic Sinus Surgery: A Prospective Clinical Trial [J].

Chen, Han ;

Zhou, Bing ;

Huang, Qian ;

Li, Cheng ;

Wu, Yubin ;

Huang, Zhenxiao ;

Li, Yunxia ;

Qu, Jing ;

Xiao, Nianci ;

Wang, Mingjie .

ENT-EAR NOSE & THROAT JOURNAL, 2024, 103 (01) :NP31-NP39

← 1 2 3 4 5 6 →