Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images

被引：1

作者：

Abimouloud, Mouhamed Laid ^{[1
,6
]}

Bensid, Khaled ^{[2
]}

Elleuch, Mohamed ^{[3
,6
]}

Ammar, Mohamed Ben ^{[4
]}

Kherallah, Monji ^{[5
,6
]}

机构：

[1] Univ Sfax, Natl Engn Sch Sfax, Sfax, Tunisia

[2] Univ Kasdi Merbah Ouargla, Lab Elect Engn LAGE, Ouargla 30000, Algeria

[3] Univ Manouba, Natl Sch Comp Sci ENSI, Manouba, Tunisia

[4] Northern Border Univ, Fac Comp & Informat Technol, Dept Informat Syst, Rafha, Saudi Arabia

[5] Fac Sci Sfax, Sfax, Tunisia

[6] Sfax Univ, Adv Technol Environm & Smart Cities ATES Unit, Sfax, Tunisia

来源：

VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART | 2025年 / 8卷 / 01期

关键词：

Breast cancer; Convolutional vision transformer; Histopathological images; Multi classification; Brekhis;

D O I：

10.1186/s42492-024-00181-8

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40x, 100x, 200x, 400x). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer.

引用

页数：27

共 47 条

[1] Vision transformer based convolutional neural network for breast cancer histopathological images classification [J].

ABIMOULOUD M.L. ;

BENSID K. ;

Elleuch M. ;

Ammar M.B. ;

KHERALLAH M. .

Multimedia Tools and Applications, 2024, 83 (39) :86833-86868

[2]

Abunasser Basem S, 2023, Asian Pac J Cancer Prev, V24, P531, DOI 10.31557/APJCP.2023.24.2.531

[3] Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis [J].

Aggarwal, Ravi ;

Sounderajah, Viknesh ;

Martin, Guy ;

Ting, Daniel S. W. ;

Karthikesalingam, Alan ;

King, Dominic ;

Ashrafian, Hutan ;

Darzi, Ara .

NPJ DIGITAL MEDICINE, 2021, 4 (01)

[4] Transfer learning-assisted multi-resolution breast cancer histopathological images classification [J].

Ahmad, Nouman ;

Asghar, Sohail ;

Gillani, Saira Andleeb .

VISUAL COMPUTER, 2022, 38 (08) :2751-2770

[5] FabNet: A Features Agglomeration-Based Convolutional Neural Network for Multiscale Breast Cancer Histopathology Images Classification [J].

Amin, Muhammad Sadiq ;

Ahn, Hyunsik .

CANCERS, 2023, 15 (04)

[6] Vision-Transformer-Based Transfer Learning for Mammogram Classification [J].

Ayana, Gelan ;

Dese, Kokeb ;

Dereje, Yisak ;

Kebede, Yonas ;

Barki, Hika ;

Amdissa, Dechassa ;

Husen, Nahimiya ;

Mulugeta, Fikadu ;

Habtamu, Bontu ;

Choe, Se-Woon .

DIAGNOSTICS, 2023, 13 (02)

[7] Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer [J].

Bejnordi, Babak Ehteshami ;

Veta, Mitko ;

van Diest, Paul Johannes ;

van Ginneken, Bram ;

Karssemeijer, Nico ;

Litjens, Geert ;

van der Laak, Jeroen A. W. M. .

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2017, 318 (22) :2199-2210

[8] A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images [J].

Boumaraf, Said ;

Liu, Xiabi ;

Zheng, Zhongshu ;

Ma, Xiaohong ;

Ferkous, Chokri .

BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 63

[9] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[10]

Dosovitskiy A., 2021, P INT C LEARN REPR I, P11929

← 1 2 3 4 5 →