MLT-Trans: Multi-level Token Transformer for Hierarchical Image Classification

被引：1

作者：

Sifuentes, Tanya Boone ^{[1
]}

Nazari, Asef ^{[1
]}

Bouadjenek, Mohamed Reda ^{[1
]}

Razzak, Imran ^{[2
]}

机构：

[1] Deakin Univ, Waurn Ponds, Vic 3216, Australia

[2] Univ New South Wales, Sydney, NSW 2052, Australia

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024 | 2024年 / 14647卷

关键词：

Hierarchical classification; Image processing; Transformer; Class tokens; Hierarchy taxonomy;

D O I：

10.1007/978-981-97-2259-4_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on Multi-level Hierarchical Classification (MLHC) of images, presenting a novel architecture that exploits the "[CLS]" (classification) token within transformers - often disregarded in computer vision tasks. Our primary goal lies in utilizing the information of every [CLS] token in a hierarchical manner. Toward this aim, we introduce a Multi-level Token Transformer (MLT-Trans). This model, trained with sharpness-aware minimization and a hierarchical loss function based on knowledge distillation is capable of being adapted to various transformer-based networks, with our choice being the Swin Transformer as the backbone model. Empirical results across diverse hierarchical datasets confirm the efficacy of our approach. The findings highlight the potential of combining transformers and [CLS] tokens, by demonstrating improvements in hierarchical evaluation metrics and accuracy up to 5.7% on the last level in comparison to the base network, thereby supporting the adoption of the MLT-Trans framework in MLHC.

引用

页码：385 / 396

页数：12

共 27 条

[1] Bertinetto L, 2020, PROC CVPR IEEE, P12503, DOI 10.1109/CVPR42600.2020.01252
[2] Boone-Sifuentes T., 2022, CIKM 22 NEW YORK NY
[3] A Mask-based Output Layer for Multi-level Hierarchical Classification
Boone-Sifuentes, Tanya
Bouadjenek, Mohamed Reda
Razzak, Imran
Hacid, Hakim
Nazari, Asef
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3833 - 3837
[4] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[5] Chen MZ, 2022, Arxiv, DOI arXiv:2203.03821
[6] Chou PY, 2023, Arxiv, DOI arXiv:2303.06442
[7] Diao Q., 2022, arXiv, DOI DOI 10.48550/ARXIV.2203.02751
[8] Dong B., 2022, CHINESE C PATTERN RE, P609, DOI 10.1007/978-3-031-18913-547
[9] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[10] Foret P, 2021, Arxiv, DOI arXiv:2010.01412

← 1 2 3 →