3D Object Detection with a Self-supervised Lidar Scene Flow Backbone

被引：12

作者：

Ercelik, Emec ^{[1
]}

Yurtsever, Ekim ^{[2
]}

Liu, Mingyu ^{[1
,3
]}

Yang, Zhijie ^{[1
]}

Zhang, Hanzhen ^{[1
]}

Topcam, Pinar ^{[1
]}

Listl, Maximilian ^{[1
]}

Cayli, Yilmaz Kaan ^{[1
]}

Knoll, Alois ^{[1
]}

机构：

[1] Tech Univ Munich, Chair Robot Artificial Intelligence & Real Time S, D-85748 Garching, Germany

[2] Ohio State Univ, Columbus, OH 43212 USA

[3] Tongji Univ, Shanghai 201804, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT X | 2022年 / 13670卷

关键词：

3D detection; Self-supervised learning; Scene flow; Lidar point clouds;

D O I：

10.1007/978-3-031-20080-9_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art lidar-based 3D object detection methods rely on supervised learning and large labeled datasets. However, annotating lidar data is resource-consuming, and depending only on supervised learning limits the applicability of trained models. Self-supervised training strategies can alleviate these issues by learning a general point cloud backbone model for downstream 3D vision tasks. Against this backdrop, we show the relationship between self-supervised multi-frame flow representations and single-frame 3D detection hypotheses. Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a supervised 3D detection head. First, a self-supervised scene flow estimation model is trained with cycle consistency. Then, the point cloud encoder of this model is used as the backbone of a single-frame 3D object detection head model. This second 3D object detection model learns to utilize motion representations to distinguish dynamic objects exhibiting different movement patterns. Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly. https://github.com emecercelik/ssl-3d-detection.git.

引用

页码：247 / 265

页数：19

共 64 条

[1] Multi-view Scene Flow Estimation: A View Centered Variational Approach [J].

Basha, Tali ;

Moses, Yael ;

Kiryati, Nahum .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 101 (01) :6-21

[2] SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation [J].

Baur, Stefan Andreas ;

Emmerichs, David Josef ;

Moosmann, Frank ;

Pinggera, Peter ;

Ommer, Bjoern ;

Geiger, Andreas .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13106-13116

[3] Monocular Differentiable Rendering for Self-supervised 3D Object Detection [J].

Beker, Deniz ;

Kato, Hiroharu ;

Morariu, Mihai Adrian ;

Ando, Takahiro ;

Matsuoka, Toru ;

Kehl, Wadim ;

Gaidon, Adrien .

COMPUTER VISION - ECCV 2020, PT XXI, 2020, 12366 :514-529

[4] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[5]

Caron M, 2021, Arxiv, DOI arXiv:2006.09882

[6] Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].

Chen, Yuhua ;

Schmid, Cordelia ;

Sminchisescu, Cristian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071

[7]

Cress C., 2022, arXiv

[8]

Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201

[9] Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection [J].

Du, Liang ;

Ye, Xiaoqing ;

Tan, Xiao ;

Feng, Jianfeng ;

Xu, Zhenbo ;

Ding, Errui ;

Wen, Shilei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13326-13335

[10]

Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074

← 1 2 3 4 5 6 7 →