A multimodal transformer to fuse images and metadata for skin disease classification

被引:66
作者
Cai, Gan [1 ]
Zhu, Yu [1 ]
Wu, Yue [1 ]
Jiang, Xiaoben [1 ]
Ye, Jiongyao [1 ]
Yang, Dawei [2 ,3 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
[3] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China
关键词
Skin disease; Deep learning; Transformer; Multimodal fusion; Attention;
D O I
10.1007/s00371-022-02492-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model's effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.
引用
收藏
页码:2781 / 2793
页数:13
相关论文
共 39 条
[1]  
Chen C., arXiv preprint arXiv:2309.08842
[2]  
Codella Noel, 2019, arXiv
[3]  
Dosovitskiy A., 2021, arXiv
[4]   A modified fuzzy clustering algorithm based on dynamic relatedness model for image segmentation [J].
Gao, Xin ;
Zhang, Yan ;
Wang, Hua ;
Sun, Yujuan ;
Zhao, Feng ;
Zhang, Xiaofeng .
VISUAL COMPUTER, 2023, 39 (04) :1583-1596
[5]   Skin lesion classification using ensembles of multi -resolution EfficientNets with meta data [J].
Gessert, Nils ;
Nielsen, Maximilian ;
Shaikh, Mohsin ;
Werner, Rene ;
Schlaefer, Alexander .
METHODSX, 2020, 7
[6]   DermaKNet: Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for Skin Lesion Diagnosis [J].
Gonzalez-Diaz, Ivan .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (02) :547-559
[7]   Progressive Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin Disease Classification [J].
Gu, Yanyang ;
Ge, Zongyuan ;
Bonnington, C. Paul ;
Zhou, Jun .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (05) :1379-1393
[8]  
Hao Y., 2017, P 55 ANN M ASS COMPU, V1
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   Integrating Patient Data Into Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review [J].
Hoehn, Julia ;
Hekler, Achim ;
Krieghoff-Henning, Eva ;
Kather, Jakob Nikolas ;
Utikal, Jochen Sven ;
Meier, Friedegund ;
Gellrich, Frank Friedrich ;
Hauschild, Axel ;
French, Lars ;
Schlager, Justin Gabriel ;
Ghoreschi, Kamran ;
Wilhelm, Tabea ;
Kutzner, Heinz ;
Heppt, Markus ;
Haferkamp, Sebastian ;
Sondermann, Wiebke ;
Schadendorf, Dirk ;
Schilling, Bastian ;
Maron, Roman C. ;
Schmitt, Max ;
Jutzi, Tanja ;
Froehling, Stefan ;
Lipka, Daniel B. ;
Brinker, Titus Josef .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (07)