Optimized Vision Transformers for Superior Plant Disease Detection

被引：0

作者：

Ouamane, Abdelmalik ^{[1
,2
]}

Chouchane, Ammar ^{[2
,3
]}

Himeur, Yassine ^{[4
]}

Miniaoui, Sami ^{[4
]}

Atalla, Shadi ^{[4
]}

Mansoor, Wathiq ^{[4
]}

Al-Ahmad, Hussain ^{[4
]}

机构：

[1] Univ Biskra, Lab LI3C, Biskra 07000, Algeria

[2] Agence Themat Rech Sci St ATRSS, Es Senia 31000, Algeria

[3] Univ Ctr Barika, Barika 05001, Algeria

[4] Univ Dubai, Coll Engn & Informat Technol, Dubai, U Arab Emirates

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Plant disease detection; vision transformer; convolutional neural network; optimized ViT model; VGG 19 and AlexNet;

D O I：

10.1109/ACCESS.2025.3547416

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Detecting plant diseases is vital for maintaining agricultural productivity and ensuring food security. Advances in computer vision, particularly with Vision Transformers (ViTs), have shown significant potential in improving the accuracy and efficiency of plant disease identification. This study provides a comprehensive evaluation of various ViT parameters to determine the most effective configuration for this purpose. Using the extensive PlantVillage dataset, we systematically analyzed the effects of patch sizes, image resolutions, embedding dimensions, the number of transformer blocks (depth), the number of heads in the multi-head attention layer, and the dimension of the MLP (FeedForward) layer on model performance. We introduced saliency map visualizations to enhance interpretability and evaluate the critical regions contributing to classification decisions, ensuring the approach's transparency and robustness. Our experiments identified the optimal ViT configuration as follows: image size = 224 x 224, patch size = 16, embedding dimension = 512, depth = 6, number of heads = 8, and MLP dimension = 1024. This configuration achieved an impressive accuracy of 99.77% on the PlantVillage dataset. In addition, we incorporated a novel cross-dataset transferability evaluation to validate the generalizability of the proposed model. Comparative analysis with traditional convolutional neural network architectures, such as VGG19 and AlexNet, revealed that our optimized ViT model not only surpasses these models in accuracy but also requires significantly fewer trainable parameters and storage space. The incorporation of a lightweight, domain-specific fine-tuning process ensures the model's adaptability to new datasets with minimal computational overhead. Our findings highlight the scalability and adaptability of ViTs, emphasizing their ability to effectively handle varying image sizes and resolutions. Moreover, our approach outperforms recent state-of-the-art methods across multiple databases, underscoring the efficacy of the chosen ViT parameters.

引用

页码：48552 / 48570

页数：19

共 50 条

[41] Fire detection using vision transformer on power plant
Zhang, Kaidi
Wang, Binjun
Tong, Xin
Liu, Keke
ENERGY REPORTS, 2022, 8 : 657 - 664
[42] ViT-SmartAgri: Vision Transformer and Smartphone-Based Plant Disease Detection for Smart Agriculture
Barman, Utpal
Sarma, Parismita
Rahman, Mirzanur
Deka, Vaskar
Lahkar, Swati
Sharma, Vaishali
Saikia, Manob Jyoti
AGRONOMY-BASEL, 2024, 14 (02):
[43] DynaSlim: Dynamic Slimming for Vision Transformers
Shi, Da
Gao, Jingsheng
Liu, Ting
Fu, Yuzhuo
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1451 - 1456
[44] Vision Transformers for Single Image Dehazing
Song, Yuda
He, Zhuqing
Qian, Hui
Du, Xin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1927 - 1941
[45] Mitigation of spatial nonstationarity with vision transformers
Liu, Lei
Santos, Javier E.
Prodanovic, Masa
Pyrcz, Michael J.
COMPUTERS & GEOSCIENCES, 2023, 178
[46] Towards improved fundus disease detection using Swin Transformers
Jawad, M. Abdul
Khursheed, Farida
Nawaz, Shah
Mir, A. H.
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 78125 - 78159
[47] Vision Transformers in Image Restoration: A Survey
Ali, Anas M.
Benjdira, Bilel
Koubaa, Anis
El-Shafai, Walid
Khan, Zahid
Boulila, Wadii
SENSORS, 2023, 23 (05)
[48] Vision Transformers in medical computer vision-A contemplative retrospection
Parvaiz, Arshi
Khalid, Muhammad Anwaar
Zafar, Rukhsana
Ameer, Huma
Ali, Muhammad
Fraz, Muhammad Moazam
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
[49] Utilizing convolutional neural networks and vision transformers for precise corn leaf disease identification
Ishak Pacal
Gültekin Işık
Neural Computing and Applications, 2025, 37 (4) : 2479 - 2496
[50] Depth-Based Intervention Detection in the Neonatal Intensive Care Unit Using Vision Transformers
Hajj-Ali, Zein
Dosso, Yasmina Souley
Greenwood, Kim
Harrold, Joann
Green, James R.
SENSORS, 2024, 24 (23)

← 1 2 3 4 5 →