Bridging 2D and 3D Object Detection: Advances in Occlusion Handling through Depth Estimation

被引:0
作者
Ouardirhi, Zainab [1 ,2 ]
Zbakh, Mostapha [2 ]
Mahmoudi, Sidi Ahmed [1 ]
机构
[1] Computer and Management Engineering Department, UMONS Faculty of Engineering, Mons
[2] Communication Networks Department, Ecole Nationale Supérieure d’Informatique and Systems Analysis, Mohammed V University in Rabat, Rabat
来源
CMES - Computer Modeling in Engineering and Sciences | 2025年 / 143卷 / 03期
关键词
3D sensors; depth estimation; monocular; multimodal fusion; Object detection; occlusion handling;
D O I
10.32604/cmes.2025.064283
中图分类号
学科分类号
摘要
Object detection in occluded environments remains a core challenge in computer vision (CV), especially in domains such as autonomous driving and robotics. While Convolutional Neural Network (CNN)-based two-dimensional (2D) and three-dimensional (3D) object detection methods have made significant progress, they often fall short under severe occlusion due to depth ambiguities in 2D imagery and the high cost and deployment limitations of 3D sensors such as Light Detection and Ranging (LiDAR). This paper presents a comparative review of recent 2D and 3D detection models, focusing on their occlusion-handling capabilities and the impact of sensor modalities such as stereo vision, Time-of-Flight (ToF) cameras, and LiDAR. In this context, we introduce FuDensityNet, our multimodal occlusion-aware detection framework that combines Red-Green-Blue (RGB) images and LiDAR data to enhance detection performance. As a forward-looking direction, we propose a monocular depth-estimation extension to FuDensityNet, aimed at replacing expensive 3D sensors with a more scalable CNN-based pipeline. Although this enhancement is not experimentally evaluated in this manuscript, we describe its conceptual design and potential for future implementation. Copyright © 2025 The Authors. Published by Tech Science Press.
引用
收藏
页码:2509 / 2571
页数:62
相关论文
共 91 条
[1]  
Ye H, Zhao J, Pan Y, Cherr W, He L, Zhang H., Robot person following under partial occlusion, 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 7591-7597, (2023)
[2]  
Ouardirhi Z, Mahmoudi SA, Zbakh M., Enhancing object detection in smart video surveillance: a survey of occlusion-handling approaches, Electron Personal Commun, 13, 3, (2024)
[3]  
Zhiqiang W, Jun L., A review of object detection based on convolutional neural network, 2017 36th Chinese Control Conference (CCC), pp. 11104-11109, (2017)
[4]  
Liu Z, Ning J, Cao Y, Wei Y, Zhang Z, Lin S, Et al., Video swin transformer, Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3202-3211, (2022)
[5]  
Liu S, Li F, Zhang H, Yang X, Qi X, Su H, Et al., DAB-DETR: dynamic anchor boxes are better queries for DETR, (2022)
[6]  
Chen X, Ma H, Wan J, Li B, Xia T., Multi-view 3D object detection network for autonomous driving, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907-1915, (2017)
[7]  
Yang T, Gu F., Overview of modulation techniques for spatially structured-light 3D imaging, Optics Laser Technol, 169, (2024)
[8]  
Ouardirhi Z, Mahmoudi SA, Zbakh M, El Ghmary M, Benjelloun M, Abdelali HA, Et al., An efficient real-time moroccan automatic license plate recognition system based on the YOLO object detector, 2022 International Conference on Big Data and Internet of Things, pp. 290-302
[9]  
Birkl R, Wofk D, Muller M., Midas v3.1—a model zoo for robust monocular relative depth estimation, (2023)
[10]  
Bhat SF, Alhashim I, Wonka P., Adabins: depth estimation using adaptive bins, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009-4018, (2021)