Multi-granularity sequence generation for hierarchical image classification

被引:1
作者
Liu, Xinda [1 ]
Wang, Lili [1 ,2 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[2] Peng Cheng Lab, Shengzhen 518000, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
hierarchical multi-granularity classification; vision and text transformer; sequence generation; fine-grained image recognition; cross-modality attention;
D O I
10.1007/s41095-022-0332-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg.
引用
收藏
页码:243 / 260
页数:18
相关论文
共 65 条
  • [1] Hierarchical multi-label prediction of gene function
    Barutcuoglu, Z
    Schapire, RE
    Troyanskaya, OG
    [J]. BIOINFORMATICS, 2006, 22 (07) : 830 - 836
  • [2] Brown TB, 2020, ADV NEUR IN, V33
  • [3] Cao Y, 2019, IEEE ICC
  • [4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [5] Cesa-Bianchi N, 2006, J MACH LEARN RES, V7, P31
  • [6] Your "Flamingo" is My "Bird": Fine-Grained, or Not
    Chang, Dongliang
    Pang, Kaiyue
    Zheng, Yixiao
    Ma, Zhanyu
    Song, Yi-Zhe
    Guo, Jun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11471 - 11480
  • [7] Multi-Granularity Regularized Re-Balancing for Class Incremental Learning
    Chen, Huitong
    Wang, Yu
    Hu, Qinghua
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7263 - 7277
  • [8] Chen Lin, 2017, [Computational Visual Media, 计算可视媒体], V3, P83
  • [9] Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition
    Chen, Tianshui
    Lin, Liang
    Chen, Riquan
    Hui, Xiaolu
    Wu, Hefeng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1371 - 1384
  • [10] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
    Chen, Tianshui
    Wu, Wenxi
    Gao, Yuefang
    Dong, Le
    Luo, Xiaonan
    Lin, Liang
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031