Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models

被引:0
作者
Wu, Kewei [1 ]
Wang, Yiran [1 ]
He, Xiaogang [1 ]
Yan, Jinyu [2 ]
Guo, Yang [2 ]
Jiang, Zhuqing [1 ]
Zhang, Xing [3 ]
Wang, Wei [3 ]
Xiong, Yongping [1 ]
Men, Aidong [1 ]
Xiao, Li [1 ]
机构
[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, 10 Xitucheng Rd, Beijing
[2] Beijing Zhuoshizhitong Technology Co., Ltd., Beijing
[3] China Resources Digital Co., Ltd., Beijing
关键词
computer vision; large-scale foundation model; object detection;
D O I
10.3390/bdcc8120175
中图分类号
学科分类号
摘要
Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to a certain extent, but they still have a gap in detection accuracy compared with closed-set detection models and cannot meet the requirements for high-precision detection in practical applications. In addition, existing detection technologies are also insufficient in interpretability, making it difficult to clearly show users the basis and process of judgment of detection results, causing users to have doubts about the trust and application of detection results. Based on the above deficiencies, we propose a new object detection algorithm based on multi-modal large language models that significantly improves the detection effect of closed-set object detection models for more difficult boundary tasks while ensuring detection accuracy, thereby achieving a semi-open set object detection algorithm. It has significant improvements in accuracy and interpretability under the verification of seven common traffic and safety production scenarios. © 2024 by the authors.
引用
收藏
相关论文
共 21 条
  • [1] Zou Z., Chen K., Shi Z., Guo Y., Ye J., Object detection in 20 years: A survey, Proc. IEEE, 111, pp. 257-276, (2023)
  • [2] Liu L., Ouyang W., Wang X., Fieguth P., Chen J., Liu X., Pietikainen M., Deep learning for generic object detection: A survey, Int. J. Comput. Vis, 128, pp. 261-318, (2020)
  • [3] Zhao Z.Q., Zheng P., Xu S.T., Wu X., Object detection with deep learning: A review, IEEE Trans. Neural Netw. Learn. Syst, 30, pp. 3212-3232, (2019)
  • [4] Dhamija A., Gunther M., Ventura J., Boult T., The overlooked elephant of object detection: Open set, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1021-1030
  • [5] Dhillon A., Verma G.K., Convolutional neural network: A review of models, methodologies and applications to object detection, Prog. Artif. Intell, 9, pp. 85-112, (2020)
  • [6] Navneet D., Histograms of oriented gradients for human detection, Int. Conf. Comput. Vis. Pattern Recognit, 2, pp. 886-893, (2005)
  • [7] Felzenszwalb P.F., Girshick R.B., McAllester D., Ramanan D., Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell, 32, pp. 1627-1645, (2009)
  • [8] Girshick R., Donahue J., Darrell T., Malik J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587
  • [9] Ren S., He K., Girshick R., Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell, 39, pp. 1137-1149, (2016)
  • [10] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.Y., Berg A.C., Ssd: Single shot multibox detector, Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, pp. 21-37, (2016)