Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution

被引：1

作者：

Huang, Detian ^{[1
]}

Song, Jiaxun ^{[1
]}

Huang, Xiaoqian ^{[2
]}

Hu, Zhenzhen ^{[3
]}

Zeng, Huanqiang ^{[1
]}

机构：

[1] Huaqiao Univ, Coll Engn, Quanzhou 362021, Peoples R China

[2] Huaqiao Univ, Coll Informat Sci & Engn, Xiamen 361021, Peoples R China

[3] Hefei Univ Technol, Coll Comp Sci & Informat Engn, Hefei 230009, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2025年 / 32卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Image restoration; Feature extraction; Degradation; Transformers; Diffusion models; Visualization; Superresolution; Navigation; Image reconstruction; Adaptive systems; Blind image super-resolution; diffusion model; multi-modal guidance; transformer model;

D O I：

10.1109/LSP.2024.3516699

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.

引用

页码：316 / 320

页数：5

共 35 条

[1]

Achiam J., 2023, Open AI GPT-4 technical report, DOI [DOI 10.48550/ARXIV.2303.08774, 10.48550/arxiv.2303.08774]

[2] NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].

Agustsson, Eirikur ;

Timofte, Radu .

2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131

[3] Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model [J].

Cai, Jianrui ;

Zeng, Hui ;

Yong, Hongwei ;

Cao, Zisheng ;

Zhang, Lei .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3086-3095

[4] Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors [J].

Chen, Chaofeng ;

Shi, Xinyu ;

Qin, Yipeng ;

Li, Xiaoming ;

Han, Xiaoguang ;

Yang, Tao ;

Guo, Shihui .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :1329-1338

[5] GDSSR: Toward Real-World Ultra-High-Resolution Image Super-Resolution [J].

Chi, Yichen ;

Yang, Wenming ;

Tian, Yapeng .

IEEE SIGNAL PROCESSING LETTERS, 2023, 30 :95-99

[6]

Dhariwal P, 2021, ADV NEUR IN, V34

[7] Generative Diffusion Prior for Unified Image Restoration and Enhancement [J].

Fei, Ben ;

Lyu, Zhaoyang ;

Pan, Liang ;

Zhang, Junzhe ;

Yang, Weidong ;

Luo, Tianyue ;

Zhang, Bo ;

Dai, Bo .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :9935-9946

[8] Generative Adversarial Networks [J].

Goodfellow, Ian ;

Pouget-Abadie, Jean ;

Mirza, Mehdi ;

Xu, Bing ;

Warde-Farley, David ;

Ozair, Sherjil ;

Courville, Aaron ;

Bengio, Yoshua .

COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144

[9] DIV8K: DIVerse 8K Resolution Image Dataset [J].

Gu, Shuhang ;

Lugmayr, Andreas ;

Danelljan, Martin ;

Fritsche, Manuel ;

Lamour, Julien ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :3512-3516

[10]

Heusel M, 2017, ADV NEUR IN, V30

← 1 2 3 4 →