Kinematic 3D Object Detection in Monocular Video

被引:134
作者
Brazil, Garrick [1 ]
Pons-Moll, Gerard [2 ]
Liu, Xiaoming [1 ]
Schiele, Bernt [2 ]
机构
[1] Michigan State Univ, Comp Sci & Engn, E Lansing, MI 48824 USA
[2] Max Planck Inst Informat, Saarland Informatics Campus, Saarbrucken, Germany
来源
COMPUTER VISION - ECCV 2020, PT XXIII | 2020年 / 12368卷
关键词
3D object detection; Monocular; Video; Physics-based;
D O I
10.1007/978-3-030-58592-1_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Perceiving the physical world in 3D is fundamental for self-driving applications. Although temporal motion is an invaluable resource to human vision for detection, tracking, and depth perception, such features have not been thoroughly utilized in modern 3D object detectors. In this work, we propose a novel method for monocular video-based 3D object detection which leverages kinematic motion to extract scene dynamics and improve localization accuracy. We first propose a novel decomposition of object orientation and a self-balancing 3D confidence. We show that both components are critical to enable our kinematic model to work effectively. Collectively, using only a single model, we efficiently leverage 3D kinematics from monocular videos to improve the overall localization precision in 3D object detection while also producing useful by-products of scene dynamics (ego-motion and per-object velocity). We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
引用
收藏
页码:135 / 152
页数:18
相关论文
共 48 条
[11]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[12]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761
[13]   Object Detection with Discriminatively Trained Part-Based Models [J].
Felzenszwalb, Pedro F. ;
Girshick, Ross B. ;
McAllester, David ;
Ramanan, Deva .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645
[14]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011
[15]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[16]   Leveraging Shape Completion for 3D Siamese Tracking [J].
Giancola, Silvio ;
Zarzar, Jesus ;
Ghanem, Bernard .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1359-1368
[17]   Local Relation Networks for Image Recognition [J].
Hu, Han ;
Zhang, Zheng ;
Xie, Zhenda ;
Lin, Stephen .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3463-3472
[18]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[19]  
Ku J, 2018, IEEE INT C INT ROBOT, P5750, DOI 10.1109/IROS.2018.8594049
[20]   Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [J].
Ku, Jason ;
Pon, Alex D. ;
Waslander, Steven L. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11859-11868