Improving Fine-Grained Image Classification With Multimodal Information

被引:2
|
作者
Xu, Jie [1 ]
Zhang, Xiaoqian [1 ]
Zhao, Changming [2 ]
Geng, Zili [1 ]
Feng, Yuren [1 ]
Miao, Ke [1 ]
Li, Yunji [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu 610225, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Image classification; Visualization; Data mining; Birds; Spatiotemporal phenomena; Fuses; Multimodal information; fine-grained image classification; multi-temporal feature fusion; self-attention; dynamic MLP; NETWORK;
D O I
10.1109/TMM.2023.3291819
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image datasets have small inter-class differences and large intra-class differences, which is a difficulty of the fine-grained image classification. Traditional fine-grained image classification methods only focus on the visual features of images. However, this limitation can be eliminated when these methods are improved with multimodal information. This paper proposes an improved fine-grained image classification method with multimodal information that includes multimodal data preprocessing, multimodal feature extraction, multi-temporal feature fusion and decision correction. The preprocessing method proposed solves the problems of scattered distribution, difficult processing and uneven contribution to prediction of multimodal data through normalization, packing phrases and weighted concatenating methods. When extracting multimodal features, the SAMLP (Self-Attention MLP) module proposed combines self-attention with MLP to capture the internal correlation of multimodal information. The multi-temporal feature fusion proposed is divided into early feature fusion and late feature fusion. The former refers to adding multimodal information markers to the original image, and the latter refers to designing a multi-cascade dynamic MLP structure to fuse visual features and multimodal features. In view of the limitation of feature fusion, a decision strategy is proposed to revise the prediction results of fused features according to the prediction results of multimodal features. Ablation experiment on INAT18-1K and INAT21-1K datasets shows that our method is effective in improving classification with multimodal information. Experiments on the INAT2021_mini large dataset show that the comprehensive method in this article has higher accuracy and negligible efficiency loss compared with the state-of-the-art method.
引用
收藏
页码:2082 / 2095
页数:14
相关论文
共 50 条
  • [31] Fine-Grained Image Classification With Gaussian Mixture Layer
    Liang, Jingyun
    Guo, Jinlin
    Liu, Xin
    Lao, Songyang
    IEEE ACCESS, 2018, 6 : 53356 - 53367
  • [32] Improving classification with semi-supervised and fine-grained learning
    Lai, Danyu
    Tian, Wei
    Chen, Long
    PATTERN RECOGNITION, 2019, 88 : 547 - 556
  • [33] Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification
    Wu, Junfeng
    Yao, Li
    Liu, Bin
    Ding, Zheyuan
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION (ICCMS 2019) AND 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS (ICICA 2019), 2019, : 133 - 136
  • [34] Improving the Conditional Fine-Grained Image Generation With Part Perception
    Han, Xuan
    You, Mingyu
    Lu, Ping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4792 - 4804
  • [35] Subtler mixed attention network on fine-grained image classification
    Liu, Chao
    Huang, Lei
    Wei, Zhiqiang
    Zhang, Wenfeng
    APPLIED INTELLIGENCE, 2021, 51 (11) : 7903 - 7916
  • [36] Pixel Saliency Based Encoding for Fine-Grained Image Classification
    Yin, Chao
    Zhang, Lei
    Liu, Ji
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 274 - 285
  • [37] Fine-grained Image Classification via Combining Vision and Language
    He, Xiangteng
    Peng, Yuxin
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
  • [38] Fine-Grained Clothing Image Classification by Style Feature Description
    Wu M.
    Liu L.
    Fu X.
    Liu L.
    Huang Q.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (05): : 780 - 791
  • [39] An Interactive Deep Learning Method For Fine-grained Image Classification
    Luo, Liumin
    Wang, Mingxia
    Liu, Xiaoqing
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2025, 28 (04): : 701 - 708
  • [40] Coordinate feature fusion networks for fine-grained image classification
    Kaiyang Liao
    Gang Huang
    Yuanlin Zheng
    Guangfeng Lin
    Congjun Cao
    Signal, Image and Video Processing, 2023, 17 : 807 - 815