A multimodal transformer to fuse images and metadata for skin disease classification

被引:0
|
作者
Gan Cai
Yu Zhu
Yue Wu
Xiaoben Jiang
Jiongyao Ye
Dawei Yang
机构
[1] East China University of Science and Technology,School of Information Science and Engineering
[2] Zhongshan Hospital,Department of Pulmonary and Critical Care Medicine
[3] Fudan University,undefined
[4] Shanghai Engineering Research Center of Internet of Things for Respiratory Medicine,undefined
来源
The Visual Computer | 2023年 / 39卷
关键词
Skin disease; Deep learning; Transformer; Multimodal fusion; Attention;
D O I
暂无
中图分类号
学科分类号
摘要
Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model’s effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.
引用
收藏
页码:2781 / 2793
页数:12
相关论文
共 50 条
  • [31] Studies on Different CNN Algorithms for Face Skin Disease Classification Based on Clinical Images
    Wu, Zhe
    Zhao, Shuang
    Peng, Yonghong
    He, Xiaoyu
    Zhao, Xinyu
    Huang, Kai
    Wu, Xian
    Fan, Wei
    Li, Fangfang
    Chen, Mingliang
    Li, Jie
    Huang, Weihong
    Chen, Xiang
    Li, Yi
    IEEE ACCESS, 2019, 7 : 66505 - 66511
  • [32] StoHisNet: A hybrid multi-classification model with CNN and Transformer for gastric pathology images
    Fu, Bangkang
    Zhang, Mudan
    He, Junjie
    Cao, Ying
    Guo, Yuchen
    Wang, Rongpin
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 221
  • [33] Self-contrastive Feature Guidance Based Multidimensional Collaborative Network of metadata and image features for skin disease classification
    Li, Feng
    Li, Min
    Zuo, Enguang
    Chen, Chen
    Chen, Cheng
    Lv, Xiaoyi
    PATTERN RECOGNITION, 2024, 156
  • [34] An efficient Transformer with neighborhood contrastive tokenization for hyperspectral images classification
    Liang, Miaomiao
    Zhang, Xianhao
    Yu, Xiangchun
    Yu, Lingjuan
    Meng, Zhe
    Zhang, Xiaohong
    Jiao, Licheng
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 131
  • [35] A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos
    Abiyev, Rahib H.
    Altabel, Mohamad Ziad
    Darwish, Manal
    Helwan, Abdulkader
    DIAGNOSTICS, 2024, 14 (07)
  • [36] TRANSOP: TRANSFORMER-BASED MULTIMODAL CLASSIFICATION FOR STROKE TREATMENT OUTCOME PREDICTION
    Samak, Zeynel A.
    Clatworthy, Philip
    Mirmehdi, Majid
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [37] Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer
    Moreno-Galvan, Diego Aaron
    Lopez-Santillan, Roberto
    Gonzalez-Gurrola, Luis Carlos
    Montes-Y-Gomez, Manuel
    Sanchez-Vega, Fernando
    Lopez-Monroy, Adrian Pastor
    INFORMATION FUSION, 2025, 113
  • [38] Multimodal Aspect-Based Sentiment Classification with Knowledge-Injected Transformer
    Xu, Zenan
    Su, Qinliang
    Xiao, Junxi
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1379 - 1384
  • [39] Automatic Classification of Clinical Skin Disease Images with Additional High-Level Position Information
    Lin, Jingyi
    Guo, Zijian
    Li, Dong
    Hu, Xiaonii
    Zhang, Yun
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8606 - 8610
  • [40] SRT: Improved transformer-based model for classification of 2D heartbeat images
    Wu, Wenwen
    Huang, Yanqi
    Wu, Xiaomei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88