Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution

被引:1
作者
Huang, Detian [1 ]
Song, Jiaxun [1 ]
Huang, Xiaoqian [2 ]
Hu, Zhenzhen [3 ]
Zeng, Huanqiang [1 ]
机构
[1] Huaqiao Univ, Coll Engn, Quanzhou 362021, Peoples R China
[2] Huaqiao Univ, Coll Informat Sci & Engn, Xiamen 361021, Peoples R China
[3] Hefei Univ Technol, Coll Comp Sci & Informat Engn, Hefei 230009, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Image restoration; Feature extraction; Degradation; Transformers; Diffusion models; Visualization; Superresolution; Navigation; Image reconstruction; Adaptive systems; Blind image super-resolution; diffusion model; multi-modal guidance; transformer model;
D O I
10.1109/LSP.2024.3516699
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.
引用
收藏
页码:316 / 320
页数:5
相关论文
共 35 条
[1]  
Achiam J., 2023, Open AI GPT-4 technical report, DOI [DOI 10.48550/ARXIV.2303.08774, 10.48550/arxiv.2303.08774]
[2]   NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].
Agustsson, Eirikur ;
Timofte, Radu .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131
[3]   Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].
Cai, Jianrui ;
Zeng, Hui ;
Yong, Hongwei ;
Cao, Zisheng ;
Zhang, Lei .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095
[4]   Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors [J].
Chen, Chaofeng ;
Shi, Xinyu ;
Qin, Yipeng ;
Li, Xiaoming ;
Han, Xiaoguang ;
Yang, Tao ;
Guo, Shihui .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :1329-1338
[5]   GDSSR: Toward Real-World Ultra-High-Resolution Image Super-Resolution [J].
Chi, Yichen ;
Yang, Wenming ;
Tian, Yapeng .
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 :95-99
[6]  
Dhariwal P, 2021, ADV NEUR IN, V34
[7]   Generative Diffusion Prior for Unified Image Restoration and Enhancement [J].
Fei, Ben ;
Lyu, Zhaoyang ;
Pan, Liang ;
Zhang, Junzhe ;
Yang, Weidong ;
Luo, Tianyue ;
Zhang, Bo ;
Dai, Bo .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :9935-9946
[8]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[9]   DIV8K: DIVerse 8K Resolution Image Dataset [J].
Gu, Shuhang ;
Lugmayr, Andreas ;
Danelljan, Martin ;
Fritsche, Manuel ;
Lamour, Julien ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :3512-3516
[10]  
Heusel M, 2017, ADV NEUR IN, V30