Multimodal Fine-Grained Transformer Model for Pest Recognition

被引：4

作者：

Zhang, Yinshuo ^{[1
,2
]}

Chen, Lei ^{[1
]}

Yuan, Yuan ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Intelligent Machines, Hefei Inst Phys Sci, Hefei 230031, Peoples R China

[2] Univ Sci & Technol China, Sci Isl Branch, Grad Sch, Hefei 230026, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 12期

基金：

中国国家自然科学基金;

关键词：

pest recognition; multimodal representation; fine-grained image recognition; vision transformer; few-shot learning;

D O I：

10.3390/electronics12122620

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning has shown great potential in smart agriculture, especially in the field of pest recognition. However, existing methods require large datasets and do not exploit the semantic associations between multimodal data. To address these problems, this paper proposes a multimodal fine-grained transformer (MMFGT) model, a novel pest recognition method that improves three aspects of transformer architecture to meet the needs of few-shot pest recognition. On the one hand, the MMFGT uses self-supervised learning to extend the transformer structure to extract target features using contrastive learning to reduce the reliance on data volume. On the other hand, fine-grained recognition is integrated into the MMFGT to focus attention on finely differentiated areas of pest images to improve recognition accuracy. In addition, the MMFGT further improves the performance in pest recognition by using the joint multimodal information from the pest's image and natural language description. Extensive experimental results demonstrate that the MMFGT obtains more competitive results compared to other excellent models, such as ResNet, ViT, SwinT, DINO, and EsViT, in pest recognition tasks, with recognition accuracy up to 98.12% and achieving 5.92% higher accuracy compared to the state-of-the-art DINO method for the baseline.

引用

页数：19

共 50 条

[1] Fine-Grained Grounding for Multimodal Speech Recognition
Srinivasan, Tejas
Sanabria, Ramon
Metze, Florian
Elliott, Desmond
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2667 - 2677
[2] TransFG: A Transformer Architecture for Fine-Grained Recognition
He, Ju
Chen, Jie-Neng
Liu, Shuai
Kortylewski, Adam
Yang, Cheng
Bai, Yutong
Wang, Changhu
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 852 - 860
[3] Fine-Grained Crowdsourcing for Fine-Grained Recognition
Jia Deng
Krause, Jonathan
Li Fei-Fei
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
[4] Convolutional transformer network for fine-grained action recognition
Ma, Yujun
Wang, Ruili
Zong, Ming
Ji, Wanting
Wang, Yi
Ye, Baoliu
NEUROCOMPUTING, 2024, 569
[5] Hybrid Granularities Transformer for Fine-Grained Image Recognition
Yu, Ying
Wang, Jinghui
ENTROPY, 2023, 25 (04)
[6] SwinFG: A fine-grained recognition scheme based on swin transformer
Ma, Zhipeng
Wu, Xiaoyu
Chu, Anzhuo
Huang, Lei
Wei, Zhiqiang
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
[7] Group-Attention Transformer for Fine-Grained Image Recognition
Yan, Bo
Wang, Siwei
Zhu, En
Liu, Xinwang
Chen, Wei
Communications in Computer and Information Science, 2022, 1587 CCIS : 40 - 54
[8] An Integrated Transformer with Collaborative Tokens Mining for Fine-Grained Recognition
Yang, Weiwei
Yin, Jian
ELECTRONICS, 2023, 12 (12)
[9] EFCMF: A Multimodal Robustness Enhancement Framework for Fine-Grained Recognition
Zou, Rongping
Zhu, Bin
Chen, Yi
Xie, Bo
Shao, Bin
APPLIED SCIENCES-BASEL, 2023, 13 (03):
[10] Multimodal Wearable Sensing for Fine-Grained Activity Recognition in Healthcare
De, Debraj
Bharti, Pratool
Das, Sajal K.
Chellappan, Sriram
IEEE INTERNET COMPUTING, 2015, 19 (05) : 26 - 35

← 1 2 3 4 5 →