MAINet: Modality-Aware Interaction Network for Medical Image Fusion

被引：0

作者：

Wei, Lisi ^{[1
,2
,3
]}

Zhao, Libo ^{[1
,3
]}

Zhang, Xiaoli ^{[1
,3
]}

机构：

[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China

[2] Hulunbuir Univ, Coll Artificial Intelligence & Big Data, Hulunbuir, Peoples R China

[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2025年 / 21卷 / 06期

关键词：

Multimodal fusion; Medical image fusion; Cascade modality interaction; Modality-awareness; Graph convolutional network; QUALITY ASSESSMENT; INFORMATION; ATTENTION; ALGORITHM; ENSEMBLE;

D O I：

10.1145/3731247

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the limitations of imaging sensors, obtaining a medical image that simultaneously captures both functional metabolic data and structural tissue details remains a significant challenge in clinical diagnosis. To address this, Multimodal Medical Image Fusion (MMIF) has emerged as an effective technique for integrating complementary information from multimodal source images, such as CT, PET, and SPECT, which is critical for providing a comprehensive understanding of both anatomical and functional aspects of the human body. One of the key challenges in MMIF is how to exchange and aggregate this multimodal information. This article rethinks MMIF by addressing the harmony of modality gaps and proposes a novel Modality-Aware Interaction Network (MAINet), which leverages cross-modal feature interaction and progressively fuses multiple features in graph space. Specifically, we introduce two key modules: the Cascade Modality Interaction (CMI) module and the Dual-Graph Learning (DGL) module. The CMI module, integrated within a multi-scale encoder with triple branches, facilitates complementary multimodal feature learning and provides beneficial feedback to enhance discriminative feature learning across modalities. In the decoding process, the DGL module aggregates hierarchical features in two distinct graph spaces, enabling global feature interactions. Moreover, the DGL module incorporates a bottom-up guidance mechanism, where deeper semantic features guide the learning of shallower detail features, thus improving the fusion process by enhancing both scale diversity and modality awareness for visual fidelity results. Experimental results on medical image datasets demonstrate the superiority of the proposed method over existing fusion approaches in both subjective and objective evaluations. We also validated the performance of the proposed method in applications such as infrared-visible image fusion and medical image segmentation.

引用

页数：23

共 59 条

[1]

Bruna J, 2014, Arxiv, DOI arXiv:1312.6203

[2] A new automated quality assessment algorithm for image fusion [J].