Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition

被引:5
作者
Agrawal, Vanita [1 ]
Jagtap, Jayant [2 ]
Patil, Shruti [3 ]
Kotecha, Ketan [3 ,4 ]
机构
[1] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Dept Comp Sci & Informat Technol, Pune, Maharashtra, India
[2] NIMS Univ Rajasthan, NIMS Inst Comp Artificial Intelligence & Machine L, Jaipur, India
[3] Symbiosis Int Deemed Univ, Symbiosis Inst Technol, Symbiosis Ctr Appl Artificial Intelligence SCAAI, Pune, Maharashtra, India
[4] UCSI Univ, Kuala Lumpur 56000, Malaysia
关键词
Convolutional Neural Network; Vision Transformer; Handwritten Digit Recognition; Machine Learning; Computer Vision;
D O I
10.1016/j.mex.2024.102554
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Digitization created a demand for highly efficient handwritten document recognition systems. A handwritten document consists of digits, text, symbols, diagrams, etc. Digits are an essential element of handwritten documents. Accurate recognition of handwritten digits is vital for effective communication and data analysis. Various researchers have attempted to address this issue with modern convolutional neural network (CNN) techniques. Even after training, CNN filter weights remain unchanged despite the high identification accuracy. As a result, the process cannot flexibly adapt to input changes. Hence computer vision researchers have recently become interested in Vision Transformers (ViTs) and Multilayer Perceptrons (MLPs). The shortcomings of CNNs gave rise to a hybrid model revolution that combines the best elements of the two fields. This paper analyzes how the hybrid convolutional ViT model affects the ability to recognize handwritten digits. Also, the real-time data contains noise, distortions, and varying writing styles. Hence, cleaned and uncleaned handwritten digit images are used for evaluation in this paper. The accuracy of the proposed method is compared with the state-of-the-art techniques, and the result shows that the proposed model achieves the highest recognition accuracy. Also, the probable solutions for recognizing other aspects of handwritten documents are discussed in this paper. center dot Analyzed the effect of convolutional vision transformer on cleaned and real-time handwritten digit images. center dot The model's performance improved with the implication of cross -validation and hyperparameter tuning. center dot The results show that the proposed model is robust, feasible, and effective on cleaned and uncleaned handwritten digits.
引用
收藏
页数:10
相关论文
共 18 条
[1]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[2]  
Ba J.L., 2016, arXiv preprint arXiv:1607.06450, DOI DOI 10.48550/ARXIV.1607.06450
[3]   Hybridizing Evolutionary Computation and Deep Neural Networks: An Approach to Handwriting Recognition Using Committees and Transfer Learning [J].
Baldominos, Alejandro ;
Saez, Yago ;
Isasi, Pedro .
COMPLEXITY, 2019, 2019
[4]  
Cavalin P, 2019, LECT NOTES COMPUT SC, V11401, P271, DOI 10.1007/978-3-030-13469-3_32
[5]  
Cohen G, 2017, Arxiv, DOI arXiv:1702.05373
[6]  
Dosovitskiy A, 2021, INT C LEARN REPR ICL
[7]  
Dufourq E, 2017, 2017 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS (PRASA-ROBMECH), P110, DOI 10.1109/RoboMech.2017.8261132
[8]  
Hendrycks D, 2020, Arxiv, DOI [arXiv:1606.08415, DOI 10.48550/ARXIV.1606.08415]
[9]   TextCaps : Handwritten Character Recognition with Very Small Datasets [J].
Jayasundara, Vinoj ;
Jayasekara, Sandaru ;
Jayasekara, Hirunima ;
Rajasegaran, Jathushan ;
Seneviratne, Suranga ;
Rodrigo, Ranga .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :254-262
[10]  
Jeevan P, 2022, arXiv