Multi-granularity sequence generation for hierarchical image classification

被引:0
作者
Xinda Liu
Lili Wang
机构
[1] Beihang University,State Key Laboratory of Virtual Reality Technology and Systems
[2] Peng Cheng Laboratory,undefined
来源
Computational Visual Media | 2024年 / 10卷
关键词
hierarchical multi-granularity classification; vision and text transformer; sequence generation; fine-grained image recognition; cross-modality attention;
D O I
暂无
中图分类号
学科分类号
摘要
Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg. [graphic not available: see fulltext]
引用
收藏
页码:243 / 260
页数:17
相关论文
共 142 条
[1]  
Niu K(2020)Improving description-based person re-identification by multi-granularity image-text alignments IEEE Transactions on Image Processing 29 5542-5556
[2]  
Huang Y(2011)A survey of hierarchical classification across different application domains Data Mining and Knowledge Discovery 22 31-72
[3]  
Ouyang W L(2006)Kernel-based learning of hierarchical multilabel classification models Journal of Machine Learning Research 7 1601-1626
[4]  
Wang L(2006)Incremental algorithms for hierarchical classification Journal of Machine Learning Research 7 31-54
[5]  
Silla C N(2016)Labelling strategies for hierarchical multi-label classification techniques Pattern Recognition 56 170-183
[6]  
Freitas A A(2006)Hierarchical multi-label prediction of gene function Bioinformatics 22 830-836
[7]  
Rousu J(2011)Hierarchical annotation of medical images Pattern Recognition 44 2436-2449
[8]  
Saunders C(2022)Knowledge-guided multi-label few-shot learning for general image recognition IEEE Transactions on Pattern Analysis and Machine Intelligence 44 1371-1384
[9]  
Szedmak S(2023)Multi-granularity regularized re-balancing for class incremental learning IEEE Transactions on Knowledge and Data Engineering 35 7263-7277
[10]  
Shawe-Taylor J(2020)Deep fuzzy tree for large-scale hierarchical visual classification IEEE Transactions on Fuzzy Systems 28 1395-1406