Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small Datasets

被引:8
作者
de Lima, Leandro M. [1 ,2 ]
Krohling, Renato A. [1 ,2 ]
机构
[1] Univ Fed Espirito Santo, Grad Program Comp Sci, Vitoria, ES, Brazil
[2] Univ Fed Espirito Santo, DEPR, Labcin Nat Inspired Comp Lab, Vitoria, ES, Brazil
来源
INTELLIGENT SYSTEMS, PT II | 2022年 / 13654卷
关键词
Transformer; Convolutional neural network; Skin lesion; Multimodal fusion; Classification;
D O I
10.1007/978-3-031-21689-3_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them based on deep convolutional neural networks. However, recent advances in computer vision achieved state-of-the-art results in many tasks, notably transformer-based networks. We explore and evaluate advances in computer vision architectures, training methods and multimodal feature fusion for skin lesion diagnosis task. Experiments show that PiT (0.800 +/- 0.006), CoaT (0.780 +/- 0.024) and ViT (0.771 +/- 0.018) transformer-based backbone models with MetaBlock fusion achieved state-of-the-art results for balanced accuracy on PAD-UFES-20 dataset.
引用
收藏
页码:282 / 296
页数:15
相关论文
共 57 条
[21]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[22]   Rethinking Spatial Dimensions of Vision Transformers [J].
Heo, Byeongho ;
Yun, Sangdoo ;
Han, Dongyoon ;
Chun, Sanghyuk ;
Choe, Junsuk ;
Oh, Seong Joon .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11916-11925
[23]  
Hinton G, 2015, Arxiv, DOI arXiv:1503.02531
[24]   Eff2Net: An efficient channel attention-based convolutional neural network for skin disease classification [J].
Karthik, R. ;
Vaichole, Tejas Sunil ;
Kulkarni, Sanika Kiran ;
Yadav, Ojaswa ;
Khan, Faiz .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 73
[25]   Transformers in Vision: A Survey [J].
Khan, Salman ;
Naseer, Muzammal ;
Hayat, Munawar ;
Zamir, Syed Waqas ;
Khan, Fahad Shahbaz ;
Shah, Mubarak .
ACM COMPUTING SURVEYS, 2022, 54 (10S)
[26]   Big Transfer (BiT): General Visual Representation Learning [J].
Kolesnikov, Alexander ;
Beyer, Lucas ;
Zhai, Xiaohua ;
Puigcerver, Joan ;
Yung, Jessica ;
Gelly, Sylvain ;
Houlsby, Neil .
COMPUTER VISION - ECCV 2020, PT V, 2020, 12350 :491-507
[27]  
Li WP, 2020, I S BIOMED IMAGING, P1996, DOI [10.1109/isbi45749.2020.9098645, 10.1109/ISBI45749.2020.9098645]
[28]  
Liu Y, 2022, Arxiv, DOI [arXiv:2111.06091, DOI 10.1109/TNNLS.2022.3227717, 10.48550/ARXIV.2111.06091]
[29]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[30]  
Naseer M, 2021, ADV NEUR IN, V34