Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors

被引:59
作者
Chen, Chaofeng [1 ]
Shi, Xinyu [2 ]
Qin, Yipeng [3 ]
Li, Xiaoming [4 ]
Han, Xiaoguang [5 ]
Yang, Tao [6 ]
Guo, Shihui [1 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Univ Waterloo, Waterloo, ON, Canada
[3] Cardiff Univ, Cardiff, Wales
[4] Harbin Inst Technol, Harbin, Peoples R China
[5] Chinese Univ Hong Kong, SSE, Shenzhen, Peoples R China
[6] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Blind Super-Resolution; FeMaSR; Feature Matching; High-Resolution Prior; VQGAN;
D O I
10.1145/3503161.3547833
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A key challenge of real-world image super-resolution (SR) is to recover the missing details in low-resolution (LR) images with complex unknown degradations (e.g., downsampling, noise and compression). Most previous works restore such missing details in the image space. To cope with the high diversity of natural images, they either rely on the unstable GANs that are difficult to train and prone to artifacts, or resort to explicit references from high-resolution (HR) images that are usually unavailable. In this work, we propose Feature Matching SR (FeMaSR), which restores realistic HR images in a much more compact feature space. Unlike image-space methods, our FeMaSR restores HR images by matching distorted LR image features to their distortion-free HR counterparts in our pretrained HR priors, and decoding the matched features to obtain realistic HR images. Specifically, our HR priors contain a discrete feature codebook and its associated decoder, which are pretrained on HR images with a Vector Quantized Generative Adversarial Network (VQ-GAN). Notably, we incorporate a novel semantic regularization in VQGAN to improve the quality of reconstructed images. For the feature matching, we first extract LR features with an LR encoder consisting of several Swin Transformer blocks and then follow a simple nearest neighbour strategy to match them with the pretrained codebook. In particular, we equip the LR encoder with residual shortcut connections to the decoder, which is critical to the optimization of feature matching loss and also helps to complement the possible feature matching errors. Experimental results show that our approach produces more realistic HR images than previous methods. Codes are released at https://github.com/chaofengc/FeMaSR.
引用
收藏
页码:1329 / 1338
页数:10
相关论文
共 64 条
[1]   NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].
Agustsson, Eirikur ;
Timofte, Radu .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131
[2]  
[Anonymous], 2017, NeurIPS
[3]  
[Anonymous], 2019, NeurIPS
[4]  
[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00328
[5]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00905
[6]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01402
[7]  
[Anonymous], 2018, CVPR, DOI DOI 10.1163/9789004385580002
[8]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01268
[9]  
[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00282
[10]  
[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00251