Co-Attention Fusion Network for Multimodal Skin Cancer Diagnosis

被引:37
作者
He, Xiaoyu [1 ]
Wang, Yong [1 ]
Zhao, Shuang [2 ]
Chen, Xiang [2 ]
机构
[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
[2] Cent South Univ, Xiangya Hosp, Dept Dermatol, Changsha 410008, Peoples R China
基金
中国国家自然科学基金;
关键词
Skin cancer diagnosis; Convolutional neural networks; Multimodal fusion; Attention mechanism; DERMOSCOPY; CLASSIFICATION;
D O I
10.1016/j.patcog.2022.108990
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, multimodal image-based methods have shown great performance in skin cancer diagnosis. These methods usually use convolutional neural networks (CNNs) to extract the features of two modali-ties (i.e., dermoscopy and clinical images), and fuse these features for classification. However, they com-monly have the following two shortcomings: 1) the feature extraction processes of the two modalities are independent and lack cooperation, which may lead to limited representation ability of the extracted features, and 2) the multimodal fusion operation is a simple concatenation followed by convolutions, thus causing rough fusion features. To address these two issues, we propose a co-attention fusion net-work (CAFNet), which uses two branches to extract the features of dermoscopy and clinical images and a hyper-branch to refine and fuse these features at all stages of the network. Specifically, the hyper -branch is composed of multiple co-attention fusion (CAF) modules. In each CAF module, we first design a co-attention (CA) block with a cross-modal attention mechanism to achieve the cooperation of two modalities, which enhances the representation ability of the extracted features through mutual guidance between the two modalities. Following the CA block, we further propose an attention fusion (AF) block that dynamically selects appropriate fusion ratios to conduct the pixel-wise multimodal fusion, which can generate fine-grained fusion features. In addition, we propose a deep-supervised loss and a combined prediction method to obtain a more robust prediction result. The results show that CAFNet achieves the average accuracy of 76.8% on the seven-point checklist dataset and outperforms state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 38 条
[1]   Pattern classification of dermoscopy images: A perceptually uniform model [J].
Abbas, Qaisar ;
Celebi, M. E. ;
Serrano, Carmen ;
Fondon Garcia, Irene ;
Ma, Guangzhi .
PATTERN RECOGNITION, 2013, 46 (01) :86-97
[2]  
[Anonymous], 2011, P 28 INT C MACH LEAR
[3]   Skin Cancer: Epidemiology, Disease Burden, Pathophysiology, Diagnosis, and Therapeutic Approaches [J].
Apalla, Zoe ;
Nashan, Dorothee ;
Weller, Richard B. ;
Castellsague, Xavier .
DERMATOLOGY AND THERAPY, 2017, 7 :S5-S19
[4]   Development of a clinically oriented system for melanoma diagnosis [J].
Barata, Catarina ;
Celebi, M. Emre ;
Marques, Jorge S. .
PATTERN RECOGNITION, 2017, 69 :270-285
[5]   Multi-Label classification of multi-modality skin lesion via hyper-connected convolutional neural network [J].
Bi, Lei ;
Feng, David Dagan ;
Fulham, Michael ;
Kim, Jinman .
PATTERN RECOGNITION, 2020, 107
[6]   Step-wise integration of deep class-specific learning for dermoscopic image segmentation [J].
Bi, Lei ;
Kim, Jinman ;
Ahn, Euijoon ;
Kumar, Ashnil ;
Feng, Dagan ;
Fulham, Michael .
PATTERN RECOGNITION, 2019, 85 :78-89
[7]   Clinical Indications for Use of Reflectance Confocal Microscopy for Skin Cancer Diagnosis [J].
Borsari, Stefania ;
Pampena, Riccardo ;
Lallas, Aimilios ;
Kyrgidis, Athanassios ;
Moscarella, Elvira ;
Benati, Elisa ;
Raucci, Margherita ;
Pellacani, Giovanni ;
Zalaudek, Iris ;
Argenziano, Giuseppe ;
Longo, Caterina .
JAMA DERMATOLOGY, 2016, 152 (10) :1093-1098
[8]   Automatic detection of blue-white veil and related structures in dermoscopy images [J].
Celebi, M. Emre ;
Iyatomi, Hitoshi ;
Stoecker, William V. ;
Moss, Randy H. ;
Rabinovitz, Harold S. ;
Argenziano, Giuseppe ;
Soyer, H. Peter .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2008, 32 (08) :670-677
[9]   Attentional Feature Fusion [J].
Dai, Yimian ;
Gieseke, Fabian ;
Oehmcke, Stefan ;
Wu, Yiquan ;
Barnard, Kobus .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3559-3568
[10]   Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering [J].
Duy-Kien Nguyen ;
Okatani, Takayuki .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6087-6096