MLT-Trans: Multi-level Token Transformer for Hierarchical Image Classification

被引:1
作者
Sifuentes, Tanya Boone [1 ]
Nazari, Asef [1 ]
Bouadjenek, Mohamed Reda [1 ]
Razzak, Imran [2 ]
机构
[1] Deakin Univ, Waurn Ponds, Vic 3216, Australia
[2] Univ New South Wales, Sydney, NSW 2052, Australia
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024 | 2024年 / 14647卷
关键词
Hierarchical classification; Image processing; Transformer; Class tokens; Hierarchy taxonomy;
D O I
10.1007/978-981-97-2259-4_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on Multi-level Hierarchical Classification (MLHC) of images, presenting a novel architecture that exploits the "[CLS]" (classification) token within transformers - often disregarded in computer vision tasks. Our primary goal lies in utilizing the information of every [CLS] token in a hierarchical manner. Toward this aim, we introduce a Multi-level Token Transformer (MLT-Trans). This model, trained with sharpness-aware minimization and a hierarchical loss function based on knowledge distillation is capable of being adapted to various transformer-based networks, with our choice being the Swin Transformer as the backbone model. Empirical results across diverse hierarchical datasets confirm the efficacy of our approach. The findings highlight the potential of combining transformers and [CLS] tokens, by demonstrating improvements in hierarchical evaluation metrics and accuracy up to 5.7% on the last level in comparison to the base network, thereby supporting the adoption of the MLT-Trans framework in MLHC.
引用
收藏
页码:385 / 396
页数:12
相关论文
共 27 条
  • [1] Bertinetto L, 2020, PROC CVPR IEEE, P12503, DOI 10.1109/CVPR42600.2020.01252
  • [2] Boone-Sifuentes T., 2022, CIKM 22 NEW YORK NY
  • [3] A Mask-based Output Layer for Multi-level Hierarchical Classification
    Boone-Sifuentes, Tanya
    Bouadjenek, Mohamed Reda
    Razzak, Imran
    Hacid, Hakim
    Nazari, Asef
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3833 - 3837
  • [4] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [5] Chen MZ, 2022, Arxiv, DOI arXiv:2203.03821
  • [6] Chou PY, 2023, Arxiv, DOI arXiv:2303.06442
  • [7] Diao Q., 2022, arXiv, DOI DOI 10.48550/ARXIV.2203.02751
  • [8] Dong B., 2022, CHINESE C PATTERN RE, P609, DOI 10.1007/978-3-031-18913-547
  • [9] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [10] Foret P, 2021, Arxiv, DOI arXiv:2010.01412