Multimodal Fine-Grained Transformer Model for Pest Recognition

被引:4
|
作者
Zhang, Yinshuo [1 ,2 ]
Chen, Lei [1 ]
Yuan, Yuan [1 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Hefei Inst Phys Sci, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Sci Isl Branch, Grad Sch, Hefei 230026, Peoples R China
基金
中国国家自然科学基金;
关键词
pest recognition; multimodal representation; fine-grained image recognition; vision transformer; few-shot learning;
D O I
10.3390/electronics12122620
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning has shown great potential in smart agriculture, especially in the field of pest recognition. However, existing methods require large datasets and do not exploit the semantic associations between multimodal data. To address these problems, this paper proposes a multimodal fine-grained transformer (MMFGT) model, a novel pest recognition method that improves three aspects of transformer architecture to meet the needs of few-shot pest recognition. On the one hand, the MMFGT uses self-supervised learning to extend the transformer structure to extract target features using contrastive learning to reduce the reliance on data volume. On the other hand, fine-grained recognition is integrated into the MMFGT to focus attention on finely differentiated areas of pest images to improve recognition accuracy. In addition, the MMFGT further improves the performance in pest recognition by using the joint multimodal information from the pest's image and natural language description. Extensive experimental results demonstrate that the MMFGT obtains more competitive results compared to other excellent models, such as ResNet, ViT, SwinT, DINO, and EsViT, in pest recognition tasks, with recognition accuracy up to 98.12% and achieving 5.92% higher accuracy compared to the state-of-the-art DINO method for the baseline.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Fine-Grained Grounding for Multimodal Speech Recognition
    Srinivasan, Tejas
    Sanabria, Ramon
    Metze, Florian
    Elliott, Desmond
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2667 - 2677
  • [2] TransFG: A Transformer Architecture for Fine-Grained Recognition
    He, Ju
    Chen, Jie-Neng
    Liu, Shuai
    Kortylewski, Adam
    Yang, Cheng
    Bai, Yutong
    Wang, Changhu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 852 - 860
  • [3] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [4] Convolutional transformer network for fine-grained action recognition
    Ma, Yujun
    Wang, Ruili
    Zong, Ming
    Ji, Wanting
    Wang, Yi
    Ye, Baoliu
    NEUROCOMPUTING, 2024, 569
  • [5] Hybrid Granularities Transformer for Fine-Grained Image Recognition
    Yu, Ying
    Wang, Jinghui
    ENTROPY, 2023, 25 (04)
  • [6] SwinFG: A fine-grained recognition scheme based on swin transformer
    Ma, Zhipeng
    Wu, Xiaoyu
    Chu, Anzhuo
    Huang, Lei
    Wei, Zhiqiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [7] Group-Attention Transformer for Fine-Grained Image Recognition
    Yan, Bo
    Wang, Siwei
    Zhu, En
    Liu, Xinwang
    Chen, Wei
    Communications in Computer and Information Science, 2022, 1587 CCIS : 40 - 54
  • [8] An Integrated Transformer with Collaborative Tokens Mining for Fine-Grained Recognition
    Yang, Weiwei
    Yin, Jian
    ELECTRONICS, 2023, 12 (12)
  • [9] EFCMF: A Multimodal Robustness Enhancement Framework for Fine-Grained Recognition
    Zou, Rongping
    Zhu, Bin
    Chen, Yi
    Xie, Bo
    Shao, Bin
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [10] Multimodal Wearable Sensing for Fine-Grained Activity Recognition in Healthcare
    De, Debraj
    Bharti, Pratool
    Das, Sajal K.
    Chellappan, Sriram
    IEEE INTERNET COMPUTING, 2015, 19 (05) : 26 - 35