Optimized Vision Transformers for Superior Plant Disease Detection

被引：0

作者：

Ouamane, Abdelmalik ^{[1
,2
]}

Chouchane, Ammar ^{[2
,3
]}

Himeur, Yassine ^{[4
]}

Miniaoui, Sami ^{[4
]}

Atalla, Shadi ^{[4
]}

Mansoor, Wathiq ^{[4
]}

Al-Ahmad, Hussain ^{[4
]}

机构：

[1] Univ Biskra, Lab LI3C, Biskra 07000, Algeria

[2] Agence Themat Rech Sci St ATRSS, Es Senia 31000, Algeria

[3] Univ Ctr Barika, Barika 05001, Algeria

[4] Univ Dubai, Coll Engn & Informat Technol, Dubai, U Arab Emirates

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Plant disease detection; vision transformer; convolutional neural network; optimized ViT model; VGG 19 and AlexNet;

D O I：

10.1109/ACCESS.2025.3547416

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Detecting plant diseases is vital for maintaining agricultural productivity and ensuring food security. Advances in computer vision, particularly with Vision Transformers (ViTs), have shown significant potential in improving the accuracy and efficiency of plant disease identification. This study provides a comprehensive evaluation of various ViT parameters to determine the most effective configuration for this purpose. Using the extensive PlantVillage dataset, we systematically analyzed the effects of patch sizes, image resolutions, embedding dimensions, the number of transformer blocks (depth), the number of heads in the multi-head attention layer, and the dimension of the MLP (FeedForward) layer on model performance. We introduced saliency map visualizations to enhance interpretability and evaluate the critical regions contributing to classification decisions, ensuring the approach's transparency and robustness. Our experiments identified the optimal ViT configuration as follows: image size = 224 x 224, patch size = 16, embedding dimension = 512, depth = 6, number of heads = 8, and MLP dimension = 1024. This configuration achieved an impressive accuracy of 99.77% on the PlantVillage dataset. In addition, we incorporated a novel cross-dataset transferability evaluation to validate the generalizability of the proposed model. Comparative analysis with traditional convolutional neural network architectures, such as VGG19 and AlexNet, revealed that our optimized ViT model not only surpasses these models in accuracy but also requires significantly fewer trainable parameters and storage space. The incorporation of a lightweight, domain-specific fine-tuning process ensures the model's adaptability to new datasets with minimal computational overhead. Our findings highlight the scalability and adaptability of ViTs, emphasizing their ability to effectively handle varying image sizes and resolutions. Moreover, our approach outperforms recent state-of-the-art methods across multiple databases, underscoring the efficacy of the chosen ViT parameters.

引用

页码：48552 / 48570

页数：19

共 50 条

[31] Combining Vision Transformers and crane load information for a rope winding detection system
Davide Picchi
Sigrid Brell-Cokcan
Construction Robotics, 2025, 9 (1)
[32] Visualization Comparison of Vision Transformers and Convolutional Neural Networks
Shi, Rui
Li, Tianxing
Zhang, Liguo
Yamaguchi, Yasushi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2327 - 2339
[33] Learning From Synthetic InSAR With Vision Transformers: The Case of Volcanic Unrest Detection
Bountos, Nikolaos Ioannis
Michail, Dimitrios
Papoutsis, Ioannis
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[34] Advances and Challenges in Computer Vision for Image-Based Plant Disease Detection: A Comprehensive Survey of Machine and Deep Learning Approaches
Qadri, Syed Asif Ahmad
Huang, Nen-Fu
Wani, Taiba Majid
Bhat, Showkat Ahmad
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2639 - 2670
[35] Advances and Challenges in Computer Vision for Image-Based Plant Disease Detection: A Comprehensive Survey of Machine and Deep Learning Approaches
Qadri, Syed Asif Ahmad
Huang, Nen-Fu
Wani, Taiba Majid
Bhat, Showkat Ahmad
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 2639 - 2670
[36] Vision Transformers with Hierarchical Attention
Liu, Yun
Wu, Yu-Huan
Sun, Guolei
Zhang, Le
Chhatkuli, Ajad
Van Gool, Luc
MACHINE INTELLIGENCE RESEARCH, 2024, 21 (04) : 670 - 683
[37] Constituent Attention for Vision Transformers
Li, Haoling
Xue, Mengqi
Song, Jie
Zhang, Haofei
Huang, Wenqi
Liang, Lingyu
Song, Mingli
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
[38] Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection
Essa, Ehab
NEUROCOMPUTING, 2024, 598
[39] IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
Nimma, Divya
Zhou, Zhaoxian
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 1767 - 1778
[40] IntelPVT: intelligent patch-based pyramid vision transformers for object detection and classification
Divya Nimma
Zhaoxian Zhou
International Journal of Machine Learning and Cybernetics, 2024, 15 : 1767 - 1778

← 1 2 3 4 5 →