Multi-granularity sequence generation for hierarchical image classification

被引:1
作者
Liu, Xinda [1 ]
Wang, Lili [1 ,2 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[2] Peng Cheng Lab, Shengzhen 518000, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
hierarchical multi-granularity classification; vision and text transformer; sequence generation; fine-grained image recognition; cross-modality attention;
D O I
10.1007/s41095-022-0332-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg.
引用
收藏
页码:243 / 260
页数:18
相关论文
共 65 条
  • [11] Destruction and Construction Learning for Fine-grained Image Recognition
    Chen, Yue
    Bai, Yalong
    Zhang, Wei
    Mei, Tao
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5152 - 5161
  • [12] Chou P-Y, 2022, arXiv
  • [13] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [14] Hierarchical annotation of medical images
    Dimitrovski, Ivica
    Kocev, Dragi
    Loskovska, Suzana
    Dzeroski, Saso
    [J]. PATTERN RECOGNITION, 2011, 44 (10-11) : 2436 - 2449
  • [15] Donahue J, 2014, PR MACH LEARN RES, V32
  • [16] Dosovitskiy A., 2021, INT C LEARNING REPRE, DOI DOI 10.48550/ARXIV.2010.11929
  • [17] HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition
    Fan, Jianping
    Zhao, Tianyi
    Kuang, Zhenzhong
    Zheng, Yu
    Zhang, Ji
    Yu, Jun
    Peng, Jinye
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (04) : 1923 - 1938
  • [18] Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up
    Ge, Weifeng
    Lin, Xiangru
    Yu, Yizhou
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3029 - 3038
  • [19] Attention mechanisms in computer vision: A survey
    Guo, Meng-Hao
    Xu, Tian-Xing
    Liu, Jiang-Jiang
    Liu, Zheng-Ning
    Jiang, Peng-Tao
    Mu, Tai-Jiang
    Zhang, Song-Hai
    Martin, Ralph R.
    Cheng, Ming-Ming
    Hu, Shi-Min
    [J]. COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) : 331 - 368
  • [20] He J, 2022, AAAI CONF ARTIF INTE, P852