CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope

被引:439
作者
Bhatt, Dulari [1 ]
Patel, Chirag [2 ]
Talsania, Hardik [1 ]
Patel, Jigar [1 ]
Vaghela, Rasmika [1 ]
Pandya, Sharnil [3 ]
Modi, Kirit [4 ]
Ghayvat, Hemant [5 ]
机构
[1] Parul Univ, Ahmadabad 382030, Gujarat, India
[2] DEPSTAR, Comp Sci & Engn, Changa 388421, Gujarat, India
[3] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Pune 412115, India
[4] Sankalchand Patel Univ, Sankalchand Patel Coll Engn, Visnagar 384315, India
[5] Linnaeus Univ, Fac Technol, Comp Sci Dept, PG Vejdes Vag, S-35195 Vaxjo, Sweden
关键词
CNN; feature-map exploitation; attention-based CNN; deep CNN; object recognition; computer vision; CONVOLUTIONAL NEURAL-NETWORK; ACTION RECOGNITION;
D O I
10.3390/electronics10202470
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computer vision is becoming an increasingly trendy word in the area of image processing. With the emergence of computer vision applications, there is a significant demand to recognize objects automatically. Deep CNN (convolution neural network) has benefited the computer vision community by producing excellent results in video processing, object recognition, picture classification and segmentation, natural language processing, speech recognition, and many other fields. Furthermore, the introduction of large amounts of data and readily available hardware has opened new avenues for CNN study. Several inspirational concepts for the progress of CNN have been investigated, including alternative activation functions, regularization, parameter optimization, and architectural advances. Furthermore, achieving innovations in architecture results in a tremendous enhancement in the capacity of the deep CNN. Significant emphasis has been given to leveraging channel and spatial information, with a depth of architecture and information processing via multi-path. This survey paper focuses mainly on the primary taxonomy and newly released deep CNN architectures, and it divides numerous recent developments in CNN architectures into eight groups. Spatial exploitation, multi-path, depth, breadth, dimension, channel boosting, feature-map exploitation, and attention-based CNN are the eight categories. The main contribution of this manuscript is in comparing various architectural evolutions in CNN by its architectural change, strengths, and weaknesses. Besides, it also includes an explanation of the CNN's components, the strengths and weaknesses of various CNN variants, research gap or open challenges, CNN applications, and the future research direction.
引用
收藏
页数:28
相关论文
共 71 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]  
Aex B., 2021, ARXIV190204394V6
[3]  
Alexey B., 2020, ARXIV200410934
[4]   A convolutional neural network neutrino event classifier [J].
Aurisano, A. ;
Radovic, A. ;
Rocco, D. ;
Himmel, A. ;
Messier, M. D. ;
Niner, E. ;
Pawloski, G. ;
Psihas, F. ;
Sousa, A. ;
Vahle, P. .
JOURNAL OF INSTRUMENTATION, 2016, 11
[5]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[6]  
Balazs C.C., 2001, THESIS EOTVOS LORAN
[7]  
Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
[8]   Human Pose Estimation via Convolutional Part Heatmap Regression [J].
Bulat, Adrian ;
Tzimiropoulos, Georgios .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :717-732
[9]   How Do the Open Source Communities Address Usability and UX Issues? An Exploratory Study [J].
Cheng, Jinghui ;
Guo, Jin L. C. .
CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,
[10]  
Chevalier M, 2015, IEEE IMAGE PROC, P3101, DOI 10.1109/ICIP.2015.7351374