Co-Attention Fusion Network for Multimodal Skin Cancer Diagnosis

被引:37
作者
He, Xiaoyu [1 ]
Wang, Yong [1 ]
Zhao, Shuang [2 ]
Chen, Xiang [2 ]
机构
[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
[2] Cent South Univ, Xiangya Hosp, Dept Dermatol, Changsha 410008, Peoples R China
基金
中国国家自然科学基金;
关键词
Skin cancer diagnosis; Convolutional neural networks; Multimodal fusion; Attention mechanism; DERMOSCOPY; CLASSIFICATION;
D O I
10.1016/j.patcog.2022.108990
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, multimodal image-based methods have shown great performance in skin cancer diagnosis. These methods usually use convolutional neural networks (CNNs) to extract the features of two modali-ties (i.e., dermoscopy and clinical images), and fuse these features for classification. However, they com-monly have the following two shortcomings: 1) the feature extraction processes of the two modalities are independent and lack cooperation, which may lead to limited representation ability of the extracted features, and 2) the multimodal fusion operation is a simple concatenation followed by convolutions, thus causing rough fusion features. To address these two issues, we propose a co-attention fusion net-work (CAFNet), which uses two branches to extract the features of dermoscopy and clinical images and a hyper-branch to refine and fuse these features at all stages of the network. Specifically, the hyper -branch is composed of multiple co-attention fusion (CAF) modules. In each CAF module, we first design a co-attention (CA) block with a cross-modal attention mechanism to achieve the cooperation of two modalities, which enhances the representation ability of the extracted features through mutual guidance between the two modalities. Following the CA block, we further propose an attention fusion (AF) block that dynamically selects appropriate fusion ratios to conduct the pixel-wise multimodal fusion, which can generate fine-grained fusion features. In addition, we propose a deep-supervised loss and a combined prediction method to obtain a more robust prediction result. The results show that CAFNet achieves the average accuracy of 76.8% on the seven-point checklist dataset and outperforms state-of-the-art methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 38 条
[21]   Does skin cancer screening save lives? [J].
Katalinic, Alexander ;
Waldmann, Annika ;
Weinstock, Martin A. ;
Geller, Alan C. ;
Eisemann, Nora ;
Greinert, Ruediger ;
Volkmer, Beate ;
Breitbart, Eckhard .
CANCER, 2012, 118 (21) :5395-5402
[22]   Seven-Point Checklist and Skin Lesion Classification Using Multitask Multimodal Neural Nets [J].
Kawahara, Jeremy ;
Daneshvar, Sara ;
Argenziano, Giuseppe ;
Hamarneh, Ghassan .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (02) :538-546
[23]   GAFnet: Group Attention Fusion Network for PAN and MS Image High-Resolution Classification [J].
Liu, Xu ;
Li, Lingling ;
Liu, Fang ;
Hou, Biao ;
Yang, Shuyuan ;
Jiao, Licheng .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) :10556-10569
[24]   Dual self-attention with co-attention networks for visual question answering [J].
Liu, Yun ;
Zhang, Xiaoming ;
Zhang, Qianyun ;
Li, Chaozhuo ;
Huang, Feiran ;
Tang, Xianghong ;
Li, Zhoujun .
PATTERN RECOGNITION, 2021, 117 (117)
[25]   Systematic review of dermoscopy and digital dermoscopy/artificial intelligence for the diagnosis of melanoma [J].
Rajpara, S. M. ;
Botello, A. P. ;
Townend, J. ;
Ormerod, A. D. .
BRITISH JOURNAL OF DERMATOLOGY, 2009, 161 (03) :591-604
[26]   Prevalence of a History of Skin Cancer in 2007 Results of an Incidence-Based Model [J].
Stern, Robert S. .
ARCHIVES OF DERMATOLOGY, 2010, 146 (03) :279-282
[27]   CentralNet: A Multilayer Approach for Multimodal Fusion [J].
Vielzeuf, Valentin ;
Lechervy, Alexis ;
Pateux, Stephane ;
Jurie, Frederic .
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT VI, 2019, 11134 :575-589
[28]   Factors contributing to high costs and inequality in China's health care system [J].
Wang, Houli ;
Xu, Tengda ;
Xu, Jin .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2007, 298 (16) :1928-1930
[29]   Melanoma Classification on Dermoscopy Images Using a Neural Network Ensemble Model [J].
Xie, Fengying ;
Fan, Haidi ;
Li, Yang ;
Jiang, Zhiguo ;
Meng, Rusong ;
Bovik, Alan .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2017, 36 (03) :849-858
[30]   Clinical Skin Lesion Diagnosis using Representations Inspired by Dermatologist Criteria [J].
Yang, Jufeng ;
Sun, Xiaoxiao ;
Liang, Jie ;
Rosin, Paul L. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1258-1266