PETR: Position Embedding Transformation for Multi-view 3D Object Detection

被引:272
作者
Liu, Yingfei [1 ]
Wang, Tiancai [1 ]
Zhang, Xiangyu [1 ]
Sun, Jian [1 ]
机构
[1] MEGVII Technol, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2022, PT XXVII | 2022年 / 13687卷
基金
国家重点研发计划;
关键词
Position embedding; Transformer; 3D object detection;
D O I
10.1007/978-3-031-19812-0_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at https://github.com/megvii- research/PETR.
引用
收藏
页码:531 / 548
页数:18
相关论文
共 58 条
[1]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[2]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[3]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[4]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[5]   Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [J].
Chabra, Rohan ;
Lenssen, Jan E. ;
Ilg, Eddy ;
Schmidt, Tanner ;
Straub, Julian ;
Lovegrove, Steven ;
Newcombe, Richard .
COMPUTER VISION - ECCV 2020, PT XXIX, 2020, 12374 :608-625
[6]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[7]   DSGN: Deep Stereo Geometry Network for 3D Object Detection [J].
Chen, Yilun ;
Liu, Shu ;
Shen, Xiaoyong ;
Jia, Jiaya .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12533-12542
[8]   Learning Continuous Image Representation with Local Implicit Image Function [J].
Chen, Yinbo ;
Liu, Sifei ;
Wang, Xiaolong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8624-8634
[9]   Learning Implicit Fields for Generative Shape Modeling [J].
Chen, Zhiqin ;
Zhang, Hao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5932-5941
[10]  
Dai ZH, 2019, Arxiv, DOI arXiv:1901.02860