A multimodal transformer to fuse images and metadata for skin disease classification

被引：0

作者：

Gan Cai

Yu Zhu

Yue Wu

Xiaoben Jiang

Jiongyao Ye

Dawei Yang

机构：

[1] East China University of Science and Technology,School of Information Science and Engineering

[2] Zhongshan Hospital,Department of Pulmonary and Critical Care Medicine

[3] Fudan University,undefined

[4] Shanghai Engineering Research Center of Internet of Things for Respiratory Medicine,undefined

来源：

The Visual Computer | 2023年 / 39卷

关键词：

Skin disease; Deep learning; Transformer; Multimodal fusion; Attention;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model’s effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.

引用

页码：2781 / 2793

页数：12

共 50 条

[31] Studies on Different CNN Algorithms for Face Skin Disease Classification Based on Clinical Images
Wu, Zhe
Zhao, Shuang
Peng, Yonghong
He, Xiaoyu
Zhao, Xinyu
Huang, Kai
Wu, Xian
Fan, Wei
Li, Fangfang
Chen, Mingliang
Li, Jie
Huang, Weihong
Chen, Xiang
Li, Yi
IEEE ACCESS, 2019, 7 : 66505 - 66511
[32] StoHisNet: A hybrid multi-classification model with CNN and Transformer for gastric pathology images
Fu, Bangkang
Zhang, Mudan
He, Junjie
Cao, Ying
Guo, Yuchen
Wang, Rongpin
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 221
[33] Self-contrastive Feature Guidance Based Multidimensional Collaborative Network of metadata and image features for skin disease classification
Li, Feng
Li, Min
Zuo, Enguang
Chen, Chen
Chen, Cheng
Lv, Xiaoyi
PATTERN RECOGNITION, 2024, 156
[34] An efficient Transformer with neighborhood contrastive tokenization for hyperspectral images classification
Liang, Miaomiao
Zhang, Xianhao
Yu, Xiangchun
Yu, Lingjuan
Meng, Zhe
Zhang, Xiaohong
Jiao, Licheng
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 131
[35] A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos
Abiyev, Rahib H.
Altabel, Mohamad Ziad
Darwish, Manal
Helwan, Abdulkader
DIAGNOSTICS, 2024, 14 (07)
[36] TRANSOP: TRANSFORMER-BASED MULTIMODAL CLASSIFICATION FOR STROKE TREATMENT OUTCOME PREDICTION
Samak, Zeynel A.
Clatworthy, Philip
Mirmehdi, Majid
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
[37] Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer
Moreno-Galvan, Diego Aaron
Lopez-Santillan, Roberto
Gonzalez-Gurrola, Luis Carlos
Montes-Y-Gomez, Manuel
Sanchez-Vega, Fernando
Lopez-Monroy, Adrian Pastor
INFORMATION FUSION, 2025, 113
[38] Multimodal Aspect-Based Sentiment Classification with Knowledge-Injected Transformer
Xu, Zenan
Su, Qinliang
Xiao, Junxi
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1379 - 1384
[39] Automatic Classification of Clinical Skin Disease Images with Additional High-Level Position Information
Lin, Jingyi
Guo, Zijian
Li, Dong
Hu, Xiaonii
Zhang, Yun
PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8606 - 8610
[40] SRT: Improved transformer-based model for classification of 2D heartbeat images
Wu, Wenwen
Huang, Yanqi
Wu, Xiaomei
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88

← 1 2 3 4 5 →