Multi-granularity sequence generation for hierarchical image classification

被引：1

作者：

Liu, Xinda ^{[1
]}

Wang, Lili ^{[1
,2
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

[2] Peng Cheng Lab, Shengzhen 518000, Peoples R China

来源：

COMPUTATIONAL VISUAL MEDIA | 2024年 / 10卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

hierarchical multi-granularity classification; vision and text transformer; sequence generation; fine-grained image recognition; cross-modality attention;

D O I：

10.1007/s41095-022-0332-2

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Hierarchical multi-granularity image classification is a challenging task that aims to tag each given image with multiple granularity labels simultaneously. Existing methods tend to overlook that different image regions contribute differently to label prediction at different granularities, and also insufficiently consider relationships between the hierarchical multi-granularity labels. We introduce a sequence-to-sequence mechanism to overcome these two problems and propose a multi-granularity sequence generation (MGSG) approach for the hierarchical multi-granularity image classification task. Specifically, we introduce a transformer architecture to encode the image into visual representation sequences. Next, we traverse the taxonomic tree and organize the multi-granularity labels into sequences, and vectorize them and add positional information. The proposed multi-granularity sequence generation method builds a decoder that takes visual representation sequences and semantic label embedding as inputs, and outputs the predicted multi-granularity label sequence. The decoder models dependencies and correlations between multi-granularity labels through a masked multi-head self-attention mechanism, and relates visual information to the semantic label information through a cross-modality attention mechanism. In this way, the proposed method preserves the relationships between labels at different granularity levels and takes into account the influence of different image regions on labels with different granularities. Evaluations on six public benchmarks qualitatively and quantitatively demonstrate the advantages of the proposed method. Our project is available at https://github.com/liuxindazz/mgsg.

引用

页码：243 / 260

页数：18

共 65 条

[1] Hierarchical multi-label prediction of gene function
Barutcuoglu, Z
Schapire, RE
Troyanskaya, OG
[J]. BIOINFORMATICS, 2006, 22 (07) : 830 - 836
[2] Brown TB, 2020, ADV NEUR IN, V33
[3] Cao Y, 2019, IEEE ICC
[4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[5] Cesa-Bianchi N, 2006, J MACH LEARN RES, V7, P31
[6] Your "Flamingo" is My "Bird": Fine-Grained, or Not
Chang, Dongliang
Pang, Kaiyue
Zheng, Yixiao
Ma, Zhanyu
Song, Yi-Zhe
Guo, Jun
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11471 - 11480
[7] Multi-Granularity Regularized Re-Balancing for Class Incremental Learning
Chen, Huitong
Wang, Yu
Hu, Qinghua
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7263 - 7277
[8] Chen Lin, 2017, [Computational Visual Media, 计算可视媒体], V3, P83
[9] Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition
Chen, Tianshui
Lin, Liang
Chen, Riquan
Hui, Xiaolu
Wu, Hefeng
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1371 - 1384
[10] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
Chen, Tianshui
Wu, Wenxi
Gao, Yuefang
Dong, Le
Luo, Xiaonan
Lin, Liang
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031

← 1 2 3 4 5 6 7 →