Improving Fine-Grained Image Classification With Multimodal Information

被引:2
|
作者
Xu, Jie [1 ]
Zhang, Xiaoqian [1 ]
Zhao, Changming [2 ]
Geng, Zili [1 ]
Feng, Yuren [1 ]
Miao, Ke [1 ]
Li, Yunji [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[2] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu 610225, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Image classification; Visualization; Data mining; Birds; Spatiotemporal phenomena; Fuses; Multimodal information; fine-grained image classification; multi-temporal feature fusion; self-attention; dynamic MLP; NETWORK;
D O I
10.1109/TMM.2023.3291819
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image datasets have small inter-class differences and large intra-class differences, which is a difficulty of the fine-grained image classification. Traditional fine-grained image classification methods only focus on the visual features of images. However, this limitation can be eliminated when these methods are improved with multimodal information. This paper proposes an improved fine-grained image classification method with multimodal information that includes multimodal data preprocessing, multimodal feature extraction, multi-temporal feature fusion and decision correction. The preprocessing method proposed solves the problems of scattered distribution, difficult processing and uneven contribution to prediction of multimodal data through normalization, packing phrases and weighted concatenating methods. When extracting multimodal features, the SAMLP (Self-Attention MLP) module proposed combines self-attention with MLP to capture the internal correlation of multimodal information. The multi-temporal feature fusion proposed is divided into early feature fusion and late feature fusion. The former refers to adding multimodal information markers to the original image, and the latter refers to designing a multi-cascade dynamic MLP structure to fuse visual features and multimodal features. In view of the limitation of feature fusion, a decision strategy is proposed to revise the prediction results of fused features according to the prediction results of multimodal features. Ablation experiment on INAT18-1K and INAT21-1K datasets shows that our method is effective in improving classification with multimodal information. Experiments on the INAT2021_mini large dataset show that the comprehensive method in this article has higher accuracy and negligible efficiency loss compared with the state-of-the-art method.
引用
收藏
页码:2082 / 2095
页数:14
相关论文
共 50 条
  • [41] Coordinate feature fusion networks for fine-grained image classification
    Liao, Kaiyang
    Huang, Gang
    Zheng, Yuanlin
    Lin, Guangfeng
    Cao, Congjun
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (03) : 807 - 815
  • [42] Fine-grained Image Classification by Visual-Semantic Embedding
    Xu, Huapeng
    Qi, Guilin
    Li, Jingjing
    Wang, Meng
    Xu, Kang
    Gao, Huan
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1043 - 1049
  • [43] Fine-grained Image Classification via Spatial Saliency Extraction
    Zhang, Juntan
    Sun, Feng-Wen
    Song, Jie
    Von Ancken, Adam
    Zhai, Richard
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 249 - 255
  • [44] Strengthen contrastive semantic consistency for fine-grained image classification
    Wang, Yupeng
    Wang, Yongli
    Ye, Qiaolin
    Lang, Wenxi
    Xu, Can
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [45] Fine-Grained Image Classification with Object-Part Model
    Hong, Jinlong
    Huang, Kaizhu
    Liang, Hai-Ning
    Wang, Xinheng
    Zhang, Rui
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, 2020, 11691 : 233 - 243
  • [46] Bilinear Residual Attention Networks for Fine-Grained Image Classification
    Wang Yang
    Liu Libo
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (12)
  • [47] A survey of recent work on fine-grained image classification techniques
    Wang, Yafei
    Wang, Zepeng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 59 : 210 - 214
  • [48] Subtler mixed attention network on fine-grained image classification
    Chao Liu
    Lei Huang
    Zhiqiang Wei
    Wenfeng Zhang
    Applied Intelligence, 2021, 51 : 7903 - 7916
  • [49] Fine-Grained Image Classification Model Based on Improved Transformer
    Tian Zhansheng
    Liu Libo
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (02)
  • [50] Cross-Part Learning for Fine-Grained Image Classification
    Liu, Man
    Zhang, Chunjie
    Bai, Huihui
    Zhang, Riquan
    Zhao, Yao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 748 - 758