A survey: object detection methods from CNN to transformer

被引:54
作者
Arkin, Ershat [1 ]
Yadikar, Nurbiya [1 ]
Xu, Xuebin [1 ]
Aysa, Alimjan [2 ]
Ubul, Kurban [1 ,2 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[2] Xinjiang Univ, Key Lab Multilingual Informat Technol, Urumqi 830046, Peoples R China
基金
美国国家科学基金会;
关键词
Computer vision; Object detection; Real-time system; CNN; Transformer; NETWORKS;
D O I
10.1007/s11042-022-13801-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Object detection is the most important problem in computer vision tasks. After AlexNet proposed, based on Convolutional Neural Network (CNN) methods have become mainstream in the computer vision field, many researches on neural networks and different transformations of algorithm structures have appeared. In order to achieve fast and accurate detection effects, it is necessary to jump out of the existing CNN framework and has great challenges. Transformer's relatively mature theoretical support and technological development in the field of Natural Language Processing have brought it into the researcher's sight, and it has been proved that Transformer's method can be used for computer vision tasks, and proved that it exceeds the existing CNN method in some tasks. In order to enable more researchers to better understand the development process of object detection methods, existing methods, different frameworks, challenging problems and development trends, paper introduced historical classic methods of object detection used CNN, discusses the highlights, advantages and disadvantages of these algorithms. By consulting a large amount of paper, the paper compared different CNN detection methods and Transformer detection methods. Vertically under fair conditions, 13 different detection methods that have a broad impact on the field and are the most mainstream and promising are selected for comparison. The comparative data gives us confidence in the development of Transformer and the convergence between different methods. It also presents the recent innovative approaches to using Transformer in computer vision tasks. In the end, the challenges, opportunities and future prospects of this field are summarized.
引用
收藏
页码:21353 / 21383
页数:31
相关论文
共 50 条
  • [1] A survey: object detection methods from CNN to transformer
    Ershat Arkin
    Nurbiya Yadikar
    Xuebin Xu
    Alimjan Aysa
    Kurban Ubul
    Multimedia Tools and Applications, 2023, 82 : 21353 - 21383
  • [2] Combining transformer and CNN for object detection in UAV imagery
    Hendria, Willy Fitra
    Phan, Quang Thinh
    Adzaka, Fikriansyah
    Jeong, Cheol
    ICT EXPRESS, 2023, 9 (02): : 258 - 263
  • [3] Remote sensing object detection based on a combination of a CNN and the Swin transformer
    Yang, Liu
    Liang, Junhong
    Guo, Liang
    Long, Yang
    Ding, Kaiyan
    He, Qingfang
    Zhang, Zhihang
    REMOTE SENSING LETTERS, 2023, 14 (05) : 450 - 460
  • [4] Transformer with Transfer CNN for Remote-Sensing-Image Object Detection
    Li, Qingyun
    Chen, Yushi
    Zeng, Ying
    REMOTE SENSING, 2022, 14 (04)
  • [5] Transformer-CNN for small image object detection
    Chen, Yan-Lin
    Lin, Chun-Liang
    Lin, Yu-Chen
    Chen, Tzu-Chun
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 129
  • [6] A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection
    Lu, Wanjie
    Lan, Chaozhen
    Niu, Chaoyang
    Liu, Wei
    Lyu, Liang
    Shi, Qunshan
    Wang, Shiju
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 1211 - 1231
  • [7] Pairwise CNN-Transformer Features for Human-Object Interaction Detection
    Quan, Hutuo
    Lai, Huicheng
    Gao, Guxue
    Ma, Jun
    Li, Junkai
    Chen, Dongji
    ENTROPY, 2024, 26 (03)
  • [8] Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion
    Yang Chen
    Hou Zhiqiang
    Li Xinyue
    Ma Sugang
    Yang Xiaobao
    ACTA PHOTONICA SINICA, 2024, 53 (03)
  • [9] A survey of the vision transformers and their CNN-transformer based variants
    Khan, Asifullah
    Raufu, Zunaira
    Sohail, Anabia
    Khan, Abdul Rehman
    Asif, Hifsa
    Asif, Aqsa
    Farooq, Umair
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL3) : S2917 - S2970
  • [10] An Object Detection Model for Power Lines With Occlusions Combining CNN and Transformer
    Shi, Weicheng
    Lyu, Xiaoqin
    Han, Lei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74