A review of convolutional neural networks in computer vision

被引:219
作者
Zhao, Xia [1 ]
Wang, Limin [1 ]
Zhang, Yufei [2 ]
Han, Xuming [3 ]
Deveci, Muhammet [4 ,5 ,6 ]
Parmar, Milan [7 ]
机构
[1] Guangdong Univ Finance & Econ, Sch Informat Sci, Guangzhou 510320, Peoples R China
[2] Changchun Univ Sci & Technol, Sch Comp Sci & Technol, Changchun 130022, Peoples R China
[3] Jinan Univ, Sch Informat Sci & Technol, Guangzhou 510632, Peoples R China
[4] Natl Def Univ, Turkish Naval Acad, Dept Ind Engn, TR-34942 Istanbul, Turkiye
[5] UCL, Bartlett Sch Sustainable Construction, 1-19 Torrington Pl, London WC1E 7HB, England
[6] Lebanese Amer Univ, Dept Elect & Comp Engn, Byblos, Lebanon
[7] Mississippi State Univ, Dept Comp Sci & Engn, Starkville, MS 39762 USA
关键词
Convolutional neural networks; Computer vision; Status quo review; Deep learning; MODELS;
D O I
10.1007/s10462-024-10721-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In computer vision, a series of exemplary advances have been made in several areas involving image classification, semantic segmentation, object detection, and image super-resolution reconstruction with the rapid development of deep convolutional neural network (CNN). The CNN has superior features for autonomous learning and expression, and feature extraction from original input data can be realized by means of training CNN models that match practical applications. Due to the rapid progress in deep learning technology, the structure of CNN is becoming more and more complex and diverse. Consequently, it gradually replaces the traditional machine learning methods. This paper presents an elementary understanding of CNN components and their functions, including input layers, convolution layers, pooling layers, activation functions, batch normalization, dropout, fully connected layers, and output layers. On this basis, this paper gives a comprehensive overview of the past and current research status of the applications of CNN models in computer vision fields, e.g., image classification, object detection, and video prediction. In addition, we summarize the challenges and solutions of the deep CNN, and future research directions are also discussed.
引用
收藏
页数:43
相关论文
共 100 条
[51]  
LIN TY, 2017, PROC CVPR IEEE, P936, DOI [DOI 10.1109/CVPR.2017.106, 10.1109/CVPR.2017.106]
[52]   Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction [J].
Liu, Zhenguang ;
Wu, Shuang ;
Jin, Shuyuan ;
Ji, Shouling ;
Liu, Qi ;
Lu, Shijian ;
Cheng, Li .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) :681-697
[53]  
Lotter W, 2017, Arxiv, DOI [arXiv:1605.08104, 10.48550/arXiv.1605.08104]
[54]   Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[55]   Multiple object tracking: A literature review [J].
Luo, Wenhan ;
Xing, Junliang ;
Milan, Anton ;
Zhang, Xiaoqin ;
Liu, Wei ;
Kim, Tae-Kyun .
ARTIFICIAL INTELLIGENCE, 2021, 293
[56]   A state-of-the-art survey of object detection techniques in microorganism image analysis: from classical methods to deep learning approaches [J].
Ma, Pingli ;
Li, Chen ;
Rahaman, Md Mamunur ;
Yao, Yudong ;
Zhang, Jiawei ;
Zou, Shuojia ;
Zhao, Xin ;
Grzegorzek, Marcin .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (02) :1627-1698
[57]   Spatial Pyramid Attention for Deep Convolutional Neural Networks [J].
Ma, Xu ;
Guo, Jingda ;
Sansom, Andrew ;
McGuire, Mara ;
Kalaani, Andrew ;
Chen, Qi ;
Tang, Sihai ;
Yang, Qing ;
Fu, Song .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :3048-3058
[58]  
Medsker L.R., 2001, INT SER COMPUTAT INT, V5, P64
[59]   Image Segmentation Using Deep Learning: A Survey [J].
Minaee, Shervin ;
Boykov, Yuri Y. ;
Porikli, Fatih ;
Plaza, Antonio J. ;
Kehtarnavaz, Nasser ;
Terzopoulos, Demetri .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) :3523-3542
[60]  
Nwankpa C, 2018, Arxiv, DOI arXiv:1811.03378