CenterLoc3D: monocular 3D vehicle localization network for roadside surveillance cameras

被引:4
作者
Tang, Xinyao [1 ]
Wang, Wei [1 ]
Song, Huansheng [1 ]
Zhao, Chunhui [1 ]
机构
[1] Changan Univ, Sch Informat Engn, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Intelligent transportation system; Monucular 3D vehicle localization; Roadside monocular camera; Multi-scale weighted-fusion module; Spatial constraints;
D O I
10.1007/s40747-022-00962-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular 3D vehicle localization is an important task for vehicle behaviour analysis, traffic flow parameter estimation and autonomous driving in Intelligent Transportation System (ITS) and Cooperative Vehicle Infrastructure System (CVIS), which is usually achieved by monocular 3D vehicle detection. However, monocular cameras cannot obtain depth information directly due to the inherent imaging mechanism, resulting in more challenging monocular 3D tasks. Currently, most of the monocular 3D vehicle detection methods still rely on 2D detectors and additional geometric constraint modules to recover 3D vehicle information, which reduces the efficiency. At the same time, most of the research is based on datasets of onboard scenes, instead of roadside perspective, which is limited in large-scale 3D perception. Therefore, we focus on 3D vehicle detection without 2D detectors in roadside scenes. We propose a 3D vehicle localization network CenterLoc3D for roadside monocular cameras, which directly predicts centroid and eight vertexes in image space, and the dimension of 3D bounding boxes without 2D detectors. To improve the precision of 3D vehicle localization, we propose a multi-scale weighted-fusion module and a loss with spatial constraints embedded in CenterLoc3D. Firstly, the transformation matrix between 2D image space and 3D world space is solved by camera calibration. Secondly, vehicle type, centroid, eight vertexes, and the dimension of 3D vehicle bounding boxes are obtained by CenterLoc3D. Finally, centroid in 3D world space can be obtained by camera calibration and CenterLoc3D for 3D vehicle localization. To the best of our knowledge, this is the first application of 3D vehicle localization for roadside monocular cameras. Hence, we also propose a benchmark for this application including a dataset (SVLD-3D), an annotation tool (LabelImg-3D), and evaluation metrics. Through experimental validation, the proposed method achieves high accuracy with AP(3D) of 51.30%, average 3D localization precision of 98%, average 3D dimension precision of 85% and real-time performance with FPS of 41.18.
引用
收藏
页码:4349 / 4368
页数:20
相关论文
共 49 条
  • [1] Barabanau I, 2020, 15 INT C COMPUTER VI
  • [2] Bochkovskiy A., 2020, YOLOV4 OPTIMAL SPEED, DOI DOI 10.48550/ARXIV.2004.10934
  • [3] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [4] Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image
    Chabot, Florian
    Chaouch, Mohamed
    Rabarisoa, Jaonary
    Teuliere, Celine
    Chateau, Thierry
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1827 - 1836
  • [5] 3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhu, Yukun
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1259 - 1272
  • [6] Monocular 3D Object Detection for Autonomous Driving
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhang, Ziyu
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2147 - 2156
  • [7] Traffic parameter estimation and control system based on machine vision
    Dai, Zhe
    Song, Huansheng
    Liang, Haoxiang
    Wu, Feifan
    Wang, Xuan
    Jia, Jinming
    Fang, Yong
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 14 (11) : 15287 - 15299
  • [8] The Pascal Visual Object Classes (VOC) Challenge
    Everingham, Mark
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
  • [9] Geiger A., 2012, C COMP VIS PATT REC
  • [10] 3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach Using Single Monocular Images
    Gupta, Ishan
    Rangesh, Akshay
    Trivedi, Mohan
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT V, 2019, 11133 : 626 - 641