PVONet: point-voxel-based semi-supervision monocular three-dimensional object detection using LiDAR camera systems

被引:2
作者
Wang, Haosen [1 ]
Ji, Xiaohang [2 ]
Peng, Kejin [2 ]
Wang, Wanqiu [3 ]
Wang, Shifeng [1 ,2 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Zhongshan Inst, Changchun, Peoples R China
[2] Changchun Univ Sci & Technol, Changchun, Peoples R China
[3] Changchun Univ Sci & Technol, Opt Engn, Changchun, Peoples R China
关键词
semi-supervised; LiDAR camera system; three-dimensional object detection; feature extraction;
D O I
10.1117/1.JEI.32.5.053015
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Light detection and ranging (LiDAR) camera systems are becoming increasingly vital for autonomous driving. The monocular three-dimensional (3D) detection task is a critical and challenging aspect of this field. However, most algorithms rely solely on manually labeled images, which is a time-consuming and labor-intensive process, and the resulting detection lacks depth information. To address this problem, a semi-supervised 3D object detection model based on LiDAR camera systems (PVONet) is proposed to improve both the detection accuracy and processing time. First, an innovative data preparation block point-voxel fusion estimation is introduced; it utilizes LiDAR points to generate 3D bounding boxes for unlabeled data, thereby significantly reducing the time compared with manual labeling. Second, a new block based on fully connected neural network for box estimation (feature extraction and 3D object detection) is presented; it is utilized to conduct feature extraction, feature correlation, and 3D box estimation on monocular images. Finally, comprehensive experiments conducted on the popular KITTI 3D detection dataset demonstrate that our PVONet is faster (30 ms on KITTI benchmark) and more accurate [with increases of 4.69%/3.82% (easy), 4.45%/2.79% (moderate), and 4.07%/3.75% (hard) aggregation processes on 3D/bird's eye view objects compared with the baseline]. This meets the requirements for high real-time performance in autonomous vehicles applications. The results demonstrate the effectiveness of our model based on LiDAR camera systems. (c) 2023 SPIE and IS&T
引用
收藏
页数:15
相关论文
共 38 条
  • [1] Balatkan Eren, 2021, 2021 6th International Conference on Computer Science and Engineering (UBMK), P419, DOI 10.1109/UBMK52708.2021.9558880
  • [2] GAC3D: improving monocular 3D object detection with ground-guide model and adaptive convolution
    Bui, Minh-Quan Viet
    Ngo, Duc Tuan
    Pham, Hoang-Anh
    Nguyen, Duc Dung
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7
  • [3] CMAN: Leaning Global Structure Correlation for Monocular 3D Object Detection
    Cao, Yuanzhouhan
    Zhang, Hui
    Li, Yidong
    Ren, Chao
    Lang, Congyan
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 24727 - 24737
  • [4] Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image
    Chabot, Florian
    Chaouch, Mohamed
    Rabarisoa, Jaonary
    Teuliere, Celine
    Chateau, Thierry
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1827 - 1836
  • [5] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [6] Monocular 3D Object Detection for Autonomous Driving
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhang, Ziyu
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2147 - 2156
  • [7] Chen XZ, 2015, ADV NEUR IN, V28
  • [8] MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships
    Chen, Yongjian
    Tai, Lei
    Sun, Kai
    Li, Mingyang
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12090 - 12099
  • [9] Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting
    Chu, Xiaomeng
    Deng, Jiajun
    Li, Yao
    Yuan, Zhenxun
    Zhang, Yanyong
    Ji, Jianmin
    Zhang, Yu
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5239 - 5247
  • [10] Randaugment: Practical automated data augmentation with a reduced search space
    Cubuk, Ekin D.
    Zoph, Barret
    Shlens, Jonathon
    Le, Quoc, V
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3008 - 3017