Semi-supervised 3D Object Detection with Proficient Teachers

被引:62
作者
Yin, Junbo [1 ]
Fang, Jin [2 ,3 ,4 ]
Zhou, Dingfu [2 ,3 ]
Zhang, Liangjun [2 ,3 ]
Xu, Cheng-Zhong [4 ]
Shen, Jianbing [4 ]
Wang, Wenguan [5 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[2] Baidu Res, Beijing, Peoples R China
[3] Natl Engn Lab Deep Learning Technol & Applicat, Beijing, Peoples R China
[4] Univ Macau, CIS, SKL IOTSC, Zhuhai, Peoples R China
[5] Univ Technol Sydney, ReLER, AAII, Ultimo, Australia
来源
COMPUTER VISION, ECCV 2022, PT XXXVIII | 2022年 / 13698卷
基金
澳大利亚研究理事会;
关键词
3D object detection; Semi-supervised learning; Point cloud;
D O I
10.1007/978-3-031-19839-7_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dominated point cloud-based 3D object detectors in autonomous driving scenarios rely heavily on the huge amount of accurately labeled samples, however, 3D annotation in the point cloud is extremely tedious, expensive and time-consuming. To reduce the dependence on large supervision, semi-supervised learning (SSL) based approaches have been proposed. The Pseudo-Labeling methodology is commonly used for SSL frameworks, however, the low-quality predictions from the teacher model have seriously limited its performance. In this work, we propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs. First, to improve the recall of pseudo labels, a Spatial-temporal Ensemble (STE) module is proposed to generate sufficient seed boxes. Second, to improve the precision of recalled boxes, a Clustering-based Box Voting (CBV) module is designed to get aggregated votes from the clustered seed boxes. This also eliminates the necessity of sophisticated thresholds to select pseudo labels. Furthermore, to reduce the negative influence of wrongly pseudo-labeled samples during the training, a soft supervision signal is proposed by considering Box-wise Contrastive Learning (BCL). The effectiveness of our model is verified on both ONCE and Waymo datasets. For example, on ONCE, our approach significantly improves the baseline by 9.51 mAP. Moreover, with half annotations, our model outperforms the oracle model with full annotations on Waymo.
引用
收藏
页码:727 / 743
页数:17
相关论文
共 41 条
[1]  
Berthelot D, 2019, ADV NEUR IN, V32
[2]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]  
Chen X., 2017, PROC CVPR IEEE, V1, P3, DOI DOI 10.1109/CVPR.2017.691
[5]  
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[6]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[7]   Augmented LiDAR Simulator for Autonomous Driving [J].
Fang, Jin ;
Zhou, Dingfu ;
Yan, Feilong ;
Zhao, Tongtong ;
Zhang, Feihu ;
Ma, Yu ;
Wang, Liang ;
Yang, Ruigang .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) :1931-1938
[8]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9]   The ApolloScape Open Dataset for Autonomous Driving and Its Application [J].
Huang, Xinyu ;
Wang, Peng ;
Cheng, Xinjing ;
Zhou, Dingfu ;
Geng, Qichuan ;
Yang, Ruigang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2702-2719
[10]  
Jeong J, 2019, ADV NEUR IN, V32