Identifying Malignant Breast Ultrasound Images Using ViT-Patch

被引:33
作者
Feng, Hao [1 ]
Yang, Bo [1 ]
Wang, Jingwen [1 ]
Liu, Mingzhe [2 ]
Yin, Lirong [3 ]
Zheng, Wenfeng [1 ]
Yin, Zhengtong [4 ]
Liu, Chao [5 ]
机构
[1] Univ Elect Sci & Technol, Sch Automation Engn, Chengdu 610000, Peoples R China
[2] Wenzhou Univ Technol, Sch Data Sci & Artificial Intelligence, Wenzhou 325000, Peoples R China
[3] Louisiana State Univ, Dept Geog & Anthropol, Baton Rouge, LA 70803 USA
[4] Guizhou Univ, Coll Resource & Environm Engn, Guiyang 550025, Peoples R China
[5] CNRS UM, LIRMM, UMR 5506, F-34095 Montpellier, France
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 06期
关键词
ViT; ultrasound; classification; detection; attention map; multi-task learning; auxiliary learning;
D O I
10.3390/app13063489
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, the Vision Transformer (ViT) model has been used for various computer vision tasks, due to its advantages to extracting long-range features. To better integrate the long-range features useful for classification, the standard ViT adds a class token, in addition to patch tokens. Despite state-of-the-art results on some traditional vision tasks, the ViT model typically requires large datasets for supervised training, and thus, it still face challenges in areas where it is difficult to build large datasets, such as medical image analysis. In the ViT model, only the output corresponding to the class token is fed to a Multi-Layer Perceptron (MLP) head for classification, and the outputs corresponding to the patch tokens are exposed. In this paper, we propose an improved ViT architecture (called ViT-Patch), which adds a shared MLP head to the output of each patch token to balance the feature learning on the class and patch tokens. In addition to the primary task, which uses the output of the class token to discriminate whether the image is malignant, a secondary task is introduced, which uses the output of each patch token to determine whether the patch overlaps with the tumor area. More interestingly, due to the correlation between the primary and secondary tasks, the supervisory information added to the patch tokens help with improving the performance of the primary task on the class token. The introduction of secondary supervision information also improves the attention interaction among the class and patch tokens. And by this way, ViT reduces the demand on dataset size. The proposed ViT-Patch is validated on a publicly available dataset, and the experimental results show its effectiveness for both malignant identification and tumor localization.
引用
收藏
页数:12
相关论文
共 33 条
  • [1] Dataset of breast ultrasound images
    Al-Dhabyani, Walid
    Gomaa, Mohammed
    Khaled, Hussien
    Fahmy, Aly
    [J]. DATA IN BRIEF, 2020, 28
  • [2] Baevski A., 2018, arXiv
  • [3] Brosch T, 2013, LECT NOTES COMPUT SC, V8150, P633, DOI 10.1007/978-3-642-40763-5_78
  • [4] Breast Tumor Detection in Ultrasound Images Using Deep Learning
    Cao, Zhantao
    Duan, Lixin
    Yang, Guowu
    Yue, Ting
    Chen, Qin
    Fu, Huazhu
    Xu, Yanwu
    [J]. PATCH-BASED TECHNIQUES IN MEDICAL IMAGING (PATCH-MI 2017), 2017, 10530 : 121 - 128
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] Dosovitskiy A., 2020, PREPRINT
  • [7] Gao X., 2021, arXiv
  • [8] Gao Y., 2021, arXiv
  • [9] Gheflati B, 2021, ARXIV
  • [10] Han K., 2021, ARXIV