A survey: object detection methods from CNN to transformer

被引:54
作者
Arkin, Ershat [1 ]
Yadikar, Nurbiya [1 ]
Xu, Xuebin [1 ]
Aysa, Alimjan [2 ]
Ubul, Kurban [1 ,2 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[2] Xinjiang Univ, Key Lab Multilingual Informat Technol, Urumqi 830046, Peoples R China
基金
美国国家科学基金会;
关键词
Computer vision; Object detection; Real-time system; CNN; Transformer; NETWORKS;
D O I
10.1007/s11042-022-13801-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Object detection is the most important problem in computer vision tasks. After AlexNet proposed, based on Convolutional Neural Network (CNN) methods have become mainstream in the computer vision field, many researches on neural networks and different transformations of algorithm structures have appeared. In order to achieve fast and accurate detection effects, it is necessary to jump out of the existing CNN framework and has great challenges. Transformer's relatively mature theoretical support and technological development in the field of Natural Language Processing have brought it into the researcher's sight, and it has been proved that Transformer's method can be used for computer vision tasks, and proved that it exceeds the existing CNN method in some tasks. In order to enable more researchers to better understand the development process of object detection methods, existing methods, different frameworks, challenging problems and development trends, paper introduced historical classic methods of object detection used CNN, discusses the highlights, advantages and disadvantages of these algorithms. By consulting a large amount of paper, the paper compared different CNN detection methods and Transformer detection methods. Vertically under fair conditions, 13 different detection methods that have a broad impact on the field and are the most mainstream and promising are selected for comparison. The comparative data gives us confidence in the development of Transformer and the convergence between different methods. It also presents the recent innovative approaches to using Transformer in computer vision tasks. In the end, the challenges, opportunities and future prospects of this field are summarized.
引用
收藏
页码:21353 / 21383
页数:31
相关论文
共 50 条
  • [21] SODFormer: Streaming Object Detection With Transformer Using Events and Frames
    Li, Dianze
    Tian, Yonghong
    Li, Jianing
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 14020 - 14037
  • [22] An Ensemble Method of CNN Models for Object Detection
    Lee, Jinsu
    Lee, Sang-Kwang
    Yang, Seong-Il
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 898 - 901
  • [23] Malicious DNS detection by combining improved transformer and CNN
    Li, Heyu
    Li, Zhangmeizhi
    Zhang, Shuyan
    Pu, Xiao
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [24] 2D Object Detection: A Survey
    Malagoli, Emanuele
    Di Persio, Luca
    MATHEMATICS, 2025, 13 (06)
  • [25] A survey of the vision transformers and their CNN-transformer based variants
    Asifullah Khan
    Zunaira Rauf
    Anabia Sohail
    Abdul Rehman Khan
    Hifsa Asif
    Aqsa Asif
    Umair Farooq
    Artificial Intelligence Review, 2023, 56 : 2917 - 2970
  • [26] A comparison of transformer and CNN-based object detection models for surface defects on Li-Ion Battery Electrodes
    Mattern, Alexander
    Gerdes, Henrik
    Grunert, Dennis
    Schmitt, Robert H.
    JOURNAL OF ENERGY STORAGE, 2025, 105
  • [27] A Survey of Object Detection Models and Its Optimization Methods
    Hong-Yi J.
    Yong-Juan W.
    Jin-Yu K.
    Zidonghua Xuebao/Acta Automatica Sinica, 2021, 47 (06): : 1232 - 1255
  • [28] Detection of Marine Oil Spill from PlanetScope Images Using CNN and Transformer Models
    Kang, Jonggu
    Yang, Chansu
    Yi, Jonghyuk
    Lee, Yangwon
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (11)
  • [29] Efficient convolutional neural networks and network compression methods for object detection: a survey
    Yong Zhou
    Lei Xia
    Jiaqi Zhao
    Rui Yao
    Bing Liu
    Multimedia Tools and Applications, 2024, 83 : 10167 - 10209
  • [30] A Survey on 3D Object Detection Methods for Autonomous Driving Applications
    Arnold, Eduardo
    Al-Jarrah, Omar Y.
    Dianati, Mehrdad
    Fallah, Saber
    Oxtoby, David
    Mouzakitis, Alex
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (10) : 3782 - 3795