Information sparsity guided transformer for multi-modal medical image super-resolution

被引:1
作者
Lu, Haotian [1 ]
Mei, Jie [2 ]
Qiu, Yu [3 ]
Li, Yumeng [1 ]
Hao, Fangwei [1 ]
Xu, Jing [1 ]
Tang, Lin [4 ]
机构
[1] Nankai Univ, Coll Artificial Intelligence, Natl Key Lab Intelligent Tracking & Forecasting In, Tianjin, Peoples R China
[2] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Sch Robot, Changsha, Peoples R China
[3] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha, Peoples R China
[4] New York Inst Technol, Dept Biol & Chem Sci, New York, NY USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Super-resolution; Sparse transformer; Medical image; Multi-modal; NETWORK; ATTENTION;
D O I
10.1016/j.eswa.2024.125428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal medical image super-resolution (SR) plays a vital role in enhancing the resolution of medical images, providing more detailed visuals that aid in accurate clinical diagnosis. Recently, Transformer-based super-resolution methods have significantly promoted performance improvement in this field due to their capacity to capture global dependencies. They usually process all non-overlapping patches as tokens and densely sample these tokens without screening for calculating attention mechanisms. However, this strategy ignores the spatial sparsity of medical images, resulting in redundant or even detrimental computations on less informative regions. Hence, this paper proposes a novel sparsity-guided medical image SR network, namely SGSRNet, by exploiting the spatial sparsity characteristics of the medical images. SG-SRNet mainly consists of two components: a sparsity mask (SM) generator for image sparsity estimation, and a sparsity-guided Transformer (SGTrans) for high-resolution image reconstruction. Specially, the SM generator generates a sparsity mask by minimizing our cross-sparsity loss which can respond to the informative positions. SGTrans first screens out the informative patches according to the sparsity mask. Then, SGTrans utilizes the designed cluster-based attention to only calculate attention between information-related tokens. We perform comprehensive experiments on three datasets to show that SG-SRNet brings significant performance enhancements with low computational complexity.
引用
收藏
页数:13
相关论文
共 72 条
[1]   Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [J].
Cai, Yuanhao ;
Lin, Jing ;
Hu, Xiaowan ;
Wang, Haoqian ;
Yuan, Xin ;
Zhang, Yulun ;
Timofte, Radu ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 :686-704
[2]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[3]  
Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[4]   Learning Continuous Image Representation with Local Implicit Image Function [J].
Chen, Yinbo ;
Liu, Sifei ;
Wang, Xiaolong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8624-8634
[5]   MICU: Image super-resolution via multi-level information compensation and U-net [J].
Chen, Yuantao ;
Xia, Runlong ;
Yang, Kai ;
Zou, Ke .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[6]  
Child R, 2019, Arxiv, DOI [arXiv:1904.10509, 10.48550/arXiv.1904.10509]
[7]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[8]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307
[9]  
Dosovitskiy A, 2021, INT C LEARN REPR ICL
[10]  
Feng C. M., 2021, Task transformer network for joint MRI reconstruction and super-resolution, P307