The ApolloScape Open Dataset for Autonomous Driving and Its Application

被引:385
作者
Huang, Xinyu [1 ]
Wang, Peng [1 ]
Cheng, Xinjing [1 ]
Zhou, Dingfu [1 ]
Geng, Qichuan [1 ]
Yang, Ruigang [1 ]
机构
[1] Baidu Res, Beijing 100085, Peoples R China
关键词
Three-dimensional displays; Semantics; Task analysis; Videos; Labeling; Two dimensional displays; Image segmentation; Autonomous driving; large-scale datasets; scene; lane parsing; self localization; 3D understanding; POSE;
D O I
10.1109/TPAMI.2019.2926463
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g., KITTI [2] or Cityscapes [3] , ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as joint 3D-2D segment labeling, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.
引用
收藏
页码:2702 / 2719
页数:18
相关论文
共 86 条
[1]  
[Anonymous], 2010, International journal of computer vision, DOI DOI 10.1007/s11263-009-0275-4
[2]  
[Anonymous], 2016, FCNS WILD PIXELLEVEL
[3]  
[Anonymous], 2016, P C ASS MACH TRANSL
[4]  
[Anonymous], 2017, IEEE C COMP VIS PATT
[5]  
[Anonymous], 2020, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2018.2844175
[6]   Higher Order Conditional Random Fields in Deep Neural Networks [J].
Arnab, Anurag ;
Jayasumana, Sadeep ;
Zheng, Shuai ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :524-540
[7]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[8]  
Byeon W, 2015, PROC CVPR IEEE, P3547, DOI 10.1109/CVPR.2015.7298977
[9]   Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence [J].
Campbell, Dylan ;
Petersson, Lars ;
Kneip, Laurent ;
Li, Hongdong .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1-10
[10]   MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features [J].
Chen, Liang-Chieh ;
Hermans, Alexander ;
Papandreou, George ;
Schroff, Florian ;
Wang, Peng ;
Adam, Hartwig .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4013-4022