Identifying Critical Tokens for Accurate Predictions in Transformer-Based Medical Imaging Models

被引：0

作者：

Kang, Solha ^{[1
]}

Vankerschaver, Joris ^{[1
,2
]}

Ozbulak, Utku ^{[1
,3
]}

机构：

[1] Ghent Univ Global Campus, Ctr Biosyst & Biotech Data Sci, Incheon, South Korea

[2] Univ Ghent, Dept Appl Math Comp Sci & Stat, Ghent, Belgium

[3] Univ Ghent, Dept Elect & Informat Syst, Ghent, Belgium

来源：

MACHINE LEARNING IN MEDICAL IMAGING, PT II, MLMI 2024 | 2025年 / 15242卷

关键词：

D O I：

10.1007/978-3-031-73290-4_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the advancements in self-supervised learning (SSL), transformer-based computer vision models have recently demonstrated superior results compared to convolutional neural networks (CNNs) and are poised to dominate the field of artificial intelligence (AI)-based medical imaging in the upcoming years. Nevertheless, similar to CNNs, unveiling the decision-making process of transformer-based models remains a challenge. In this work, we take a step towards demystifying the decision-making process of transformer-based medical imaging models and propose "Token Insight", a novel method that identifies the critical tokens that contribute to the prediction made by the model. Our method relies on the principled approach of token discarding native to transformer-based models, requires no additional module, and can be applied to any transformer model. Using the proposed approach, we quantify the importance of each token based on its contribution to the prediction and enable a more nuanced understanding of the model's decisions. Our experimental results which are showcased on the problem of colonic polyp identification using both supervised and self-supervised pretrained vision transformers indicate that Token Insight contributes to a more transparent and interpretable transformer-based medical imaging model, fostering trust and facilitating broader adoption in clinical settings.

引用

页码：169 / 179

页数：11

共 30 条

[1]

Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[2] Why Attentions May Not Be Interpretable? [J].

Bai, Bing ;

Liang, Jian ;

Zhang, Guanhua ;

Li, Hao ;

Bai, Kun ;

Wang, Fei .

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :25-34

[3]

Bastings J, 2020, Arxiv, DOI [arXiv:2010.05607, 10.48550/arXiv.2010.05607]

[4]

Bolya D, 2022, arXiv, DOI [arXiv:2210.09461, DOI 10.48550/ARXIV.2210.09461]

[5] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[6] Transformer Interpretability Beyond Attention Visualization [J].

Chefer, Hila ;

Gur, Shir ;

Wolf, Lior .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :782-791

[7] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

[8] An Empirical Study of Training Self-Supervised Vision Transformers [J].

Chen, Xinlei ;

Xie, Saining ;

He, Kaiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9620-9629

[9] Shortcut learning in deep neural networks [J].

Geirhos, Robert ;

Jacobsen, Joern-Henrik ;

Michaelis, Claudio ;

Zemel, Richard ;

Brendel, Wieland ;

Bethge, Matthias ;

Wichmann, Felix A. .

NATURE MACHINE INTELLIGENCE, 2020, 2 (11) :665-673

[10] Which Tokens to Use? Investigating Token Reduction in Vision Transformers [J].

Haurum, Joakim Bruslund ;

Escalera, Sergio ;

Taylor, Graham W. ;

Moeslund, Thomas B. .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, :773-783

← 1 2 3 →