A multimodal transformer to fuse images and metadata for skin disease classification

被引:0
|
作者
Gan Cai
Yu Zhu
Yue Wu
Xiaoben Jiang
Jiongyao Ye
Dawei Yang
机构
[1] East China University of Science and Technology,School of Information Science and Engineering
[2] Zhongshan Hospital,Department of Pulmonary and Critical Care Medicine
[3] Fudan University,undefined
[4] Shanghai Engineering Research Center of Internet of Things for Respiratory Medicine,undefined
来源
The Visual Computer | 2023年 / 39卷
关键词
Skin disease; Deep learning; Transformer; Multimodal fusion; Attention;
D O I
暂无
中图分类号
学科分类号
摘要
Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model’s effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.
引用
收藏
页码:2781 / 2793
页数:12
相关论文
共 50 条
  • [1] A multimodal transformer to fuse images and metadata for skin disease classification
    Cai, Gan
    Zhu, Yu
    Wu, Yue
    Jiang, Xiaoben
    Ye, Jiongyao
    Yang, Dawei
    VISUAL COMPUTER, 2023, 39 (07) : 2781 - 2793
  • [2] A deep learning based multimodal fusion model for skin lesion diagnosis using smartphone collected clinical images and metadata
    Ou, Chubin
    Zhou, Sitong
    Yang, Ronghua
    Jiang, Weili
    He, Haoyang
    Gan, Wenjun
    Chen, Wentao
    Qin, Xinchi
    Luo, Wei
    Pi, Xiaobing
    Li, Jiehua
    FRONTIERS IN SURGERY, 2022, 9
  • [3] Multimodal Region-Based Transformer for the Classification and Prediction of Alzheimer's Disease
    Mueller, Kevin
    Meyer-Baese, Anke
    Erlebacher, Gordon
    MEDICAL IMAGING 2022: BIOMEDICAL APPLICATIONS IN MOLECULAR, STRUCTURAL, AND FUNCTIONAL IMAGING, 2022, 12036
  • [4] Impact of metadata in multimodal classification of bone tumours
    Hinterwimmer, Florian
    Guenther, Michael
    Consalvo, Sarah
    Neumann, Jan
    Gersing, Alexandra
    Woertler, Klaus
    von Eisenhart-Rothe, Ruediger
    Burgkart, Rainer
    Rueckert, Daniel
    BMC MUSCULOSKELETAL DISORDERS, 2024, 25 (01)
  • [5] DeepMetaForge: A Deep Vision-Transformer Metadata-Fusion Network for Automatic Skin Lesion Classification
    Vachmanus, Sirawich
    Noraset, Thanapon
    Piyanonpong, Waritsara
    Rattananukrom, Teerapong
    Tuarob, Suppawong
    IEEE ACCESS, 2023, 11 : 145467 - 145484
  • [6] TUFusion: A Transformer-Based Universal Fusion Algorithm for Multimodal Images
    Zhao, Yangyang
    Zheng, Qingchun
    Zhu, Peihao
    Zhang, Xu
    Ma, Wenpeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1712 - 1725
  • [7] A Novel Vision Transformer Model for Skin Cancer Classification
    Guang Yang
    Suhuai Luo
    Peter Greer
    Neural Processing Letters, 2023, 55 : 9335 - 9351
  • [8] A Novel Vision Transformer Model for Skin Cancer Classification
    Yang, Guang
    Luo, Suhuai
    Greer, Peter
    NEURAL PROCESSING LETTERS, 2023, 55 (07) : 9335 - 9351
  • [9] A multimodal hyper-fusion transformer for remote sensing image classification
    Ma, Mengru
    Ma, Wenping
    Jiao, Licheng
    Liu, Xu
    Li, Lingling
    Feng, Zhixi
    Liu, Fang
    Yang, Shuyuan
    INFORMATION FUSION, 2023, 96 : 66 - 79
  • [10] Visual and Linguistic Double Transformer Fusion Model for Multimodal Tweet Classification
    Zhou, Jinyan
    Wang, Xingang
    Liu, Ning
    Liu, Xiaoyu
    Lv, Jiandong
    Li, Xiaomin
    Zhang, Hong
    Cao, Rui
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,