Transformers for Object Detection in Large Point Clouds

被引：1

作者：

Ruppel, Felicia ^{[1
,2
]}

Faion, Florian ^{[1
]}

Glaeser, Claudius ^{[1
]}

Dietmayer, Klaus ^{[2
]}

机构：

[1] Robert Bosch GmbH, Corp Res, D-71272 Renningen, Germany

[2] Ulm Univ, Inst Measurement Control & Microtechnol, Ulm, Germany

来源：

2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC) | 2022年

关键词：

D O I：

10.1109/ITSC55140.2022.9921840

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present TransLPC, a novel detection model for large point clouds that is based on a transformer architecture. While object detection with transformers has been an active field of research, it has proved difficult to apply such models to point clouds that span a large area, e.g. those that are common in autonomous driving, with lidar or radar data. TransLPC is able to remedy these issues: The structure of the transformer model is modified to allow for larger input sequence lengths, which are sufficient for large point clouds. Besides this, we propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries. The queries are repositioned between layers, moving them closer to the bounding box they are estimating, in an efficient manner. This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data. Besides this, the proposed method is compatible with existing transformer-based solutions that require object detection, e.g. for joint multi-object tracking and detection, and enables them to be used in conjunction with large point clouds.

引用

页码：832 / 838

页数：7

共 31 条

[1] A Survey on 3D Object Detection Methods for Autonomous Driving Applications [J].

Arnold, Eduardo ;

Al-Jarrah, Omar Y. ;

Dianati, Mehrdad ;

Fallah, Saber ;

Oxtoby, David ;

Mouzakitis, Alex .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (10) :3782-3795

[2]

Brown TB, 2020, ADV NEUR IN, V33

[3] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[6]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]

[7]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[8]

Engelcke Martin, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1355, DOI 10.1109/ICRA.2017.7989161

[9]

Gao P., 2021, P IEEE CVF C COMP VI, P3621

[10] Deep Learning for 3D Point Clouds: A Survey [J].

Guo, Yulan ;

Wang, Hanyun ;

Hu, Qingyong ;

Liu, Hao ;

Liu, Li ;

Bennamoun, Mohammed .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) :4338-4364

← 1 2 3 4 →