Dynamic text prompt joint multimodal features for accurate plant disease image captioning

被引：0

作者：

Liang, Fangfang ^{[1
]}

Huang, Zilong ^{[1
]}

Wang, Wenjian ^{[2
]}

He, Zhenxue ^{[1
]}

En, Qing ^{[3
]}

机构：

[1] Hebei Agr Univ, Sch Informat Sci & Technol, Baoding 071001, Hebei, Peoples R China

[2] Baidu, Beijing 100085, Peoples R China

[3] Carleton Univ, Sch Comp Sci, Artificial Intelligence & Machine Learning AIML La, Ottawa, ON K1S 5B6, Canada

来源：

VISUAL COMPUTER | 2024年

基金：

中国国家自然科学基金;

关键词：

Plant disease captioning; Multimodal; Dynamic prompts; BLIP; Visual question answering;

D O I：

10.1007/s00371-024-03729-0

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Plant disease captioning is crucial for agricultural pest and disease prevention. However, generating accurate captions for plant disease images remains challenging because of the limited availability of datasets and complex disease manifestations. In this study, we propose BLIP-DP, a novel approach that dynamically generates prompts to enhance multimodal feature fusion for plant disease image captioning. We manually annotated over 20,000 images from the PlantVillage dataset and developed a model that incorporates a ViT, BERT, and a visual question-answering module. The experimental results demonstrate the effectiveness of our method, achieving a BLEU-4 score of 83.4, which represents an improvement of 4.1 over the original baseline, outperforming previous methods. Our work not only improves the accuracy of plant disease descriptions but also paves the way for future research in this underexplored area. The code is available at https://github.com/zilonghh/BLIP-DP.

引用

页码：5405 / 5419

页数：15