Interactive CNN and Transformer-Based Cross-Attention Fusion Network for Medical Image Classification

被引：0

作者：

Cai, Shu ^{[1
]}

Zhang, Qiude ^{[2
]}

Wang, Shanshan ^{[1
]}

Hu, Junjie ^{[1
]}

Zeng, Liang ^{[1
]}

Li, Kaiyan ^{[3
]}

机构：

[1] Hubei Univ Technol, Sch Elect & Elect Engn, Wuhan, Peoples R China

[2] Huazhong Univ Sci & Technol, Biomed Engn Dept, Wuhan, Peoples R China

[3] Tongji Hosp, Tongji Med Coll HUST, Wuhan, Peoples R China

来源：

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY | 2025年 / 35卷 / 03期

关键词：

CNN; cross-attention; feature fusion; transformer;

D O I：

10.1002/ima.70077

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Medical images typically contain complex structures and abundant detail, exhibiting variations in texture, contrast, and noise across different imaging modalities. Different types of images contain both local and global features with varying expressions and importance, making accurate classification highly challenging. Convolutional neural network (CNN)-based approaches are limited by the size of the convolutional kernel, which restricts their ability to capture global contextual information effectively. In addition, while transformer-based models can compensate for the limitations of convolutional neural networks by modeling long-range dependencies, they are difficult to extract fine-grained local features from images. To address these issues, we propose a novel architecture, the Interactive CNN and Transformer for Cross Attention Fusion Network (IFC-Net). This model leverages the strengths of CNNs for efficient local feature extraction and transformers for capturing global dependencies, enabling it to preserve local features and global contextual relationships. Additionally, we introduce a cross-attention fusion module that adaptively adjusts the feature fusion strategy, facilitating efficient integration of local and global features and enabling dynamic information exchange between the CNN and transformer components. Experimental results on four benchmark datasets, ISIC2018, COVID-19, and liver cirrhosis (line array, convex array), demonstrate that the proposed model achieves superior classification performance, outperforming both CNN and transformer-only architectures.

引用

页数：17

共 46 条

[1] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[2] Recent advances and clinical applications of deep learning in medical image analysis [J].

Chen, Xuxin ;

Wang, Ximin ;

Zhang, Ke ;

Fung, Kar-Ming ;

Thai, Theresa C. ;

Moore, Kathleen ;

Mannel, Robert S. ;

Liu, Hong ;

Zheng, Bin ;

Qiu, Yuchen .

MEDICAL IMAGE ANALYSIS, 2022, 79

[3] ResGANet: Residual group attention network for medical image classification and segmentation [J].

Cheng, Junlong ;

Tian, Shengwei ;

Yu, Long ;

Gao, Chengrui ;

Kang, Xiaojing ;

Ma, Xiang ;

Wu, Weidong ;

Liu, Shijia ;

Lu, Hongchun .

MEDICAL IMAGE ANALYSIS, 2022, 76

[4] A novel convolutional neural network-based approach for brain tumor classification using magnetic resonance images [J].

Cinar, Necip ;

Kaya, Mehmet ;

Kaya, Buket .

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2023, 33 (03) :895-908

[5]

Codella N., 2019, arXiv

[6] Mapping seabed sediments: Comparison of manual, geostatistical, object-based image analysis and machine learning approaches [J].

Diesing, Markus ;

Green, Sophie L. ;

Stephens, David ;

Lark, R. Murray ;

Stewart, Heather A. ;

Dove, Dayton .

CONTINENTAL SHELF RESEARCH, 2014, 84 :107-119

[7]

Ding X., 2021, INT C LEARN REPR ICL

[8]

Dosovitskiy A., 2021, P INT C LEARN REPR, DOI [10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]

[9] A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery [J].

Duro, Dennis C. ;

Franklin, Steven E. ;

Dube, Monique G. .

REMOTE SENSING OF ENVIRONMENT, 2012, 118 :259-272

[10] COVID-19 CT image recognition algorithm based on transformer and CNN [J].

Fan, Xiaole ;

Feng, Xiufang ;

Dong, Yunyun ;

Hou, Huichao .

DISPLAYS, 2022, 72

← 1 2 3 4 5 →