DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

被引:219
作者
Yu, Haibao [1 ]
Luo, Yizhen [1 ,3 ,5 ]
Shu, Mao [2 ]
Huo, Yiyi [1 ,4 ,5 ]
Yang, Zebang [1 ,3 ,5 ]
Shi, Yifeng [2 ]
Guo, Zhenglong [2 ]
Li, Hanyu [2 ]
Hu, Xing [2 ]
Yuan, Jirui [1 ]
Nie, Zaiqing [1 ]
机构
[1] Tsinghua Univ, Inst AI Ind Res AIR, Beijing, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
[5] AIR, Beijing, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
LIDAR; TRACKING; FUSION;
D O I
10.1109/CVPR52688.2022.02067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 Li-DAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X.
引用
收藏
页码:21329 / 21338
页数:10
相关论文
共 31 条
[1]  
[Anonymous], 2020, MMDETECTION3D OPENMM
[2]  
[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4
[3]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[4]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   Automatic Vehicle Tracking With Roadside LiDAR Data for the Connected-Vehicles System [J].
Cui, Yuepeng ;
Xu, Hao ;
Wu, Jianqing ;
Sun, Yuan ;
Zhao, Junxuan .
IEEE INTELLIGENT SYSTEMS, 2019, 34 (03) :44-51
[7]   Object Classification Using CNN-Based Fusion of Vision and LIDAR in Autonomous Vehicle Environment [J].
Gao, Hongbo ;
Cheng, Bo ;
Wang, Jianqiang ;
Li, Keqiang ;
Zhao, Jianhui ;
Li, Deyi .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (09) :4224-4231
[8]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9]  
Haghbayan MH, 2018, IEEE INT C INTELL TR, P2163, DOI 10.1109/ITSC.2018.8569890
[10]  
Krajewski R, 2018, IEEE INT C INTELL TR, P2118, DOI 10.1109/ITSC.2018.8569552