Rethinking Super-Resolution as Text-Guided Details Generation

被引：0

作者：

Ma, Chenxi ^{[1
]}

Yan, Bo ^{[1
]}

Lin, Qing ^{[1
]}

Tan, Weimin ^{[1
]}

Chen, Siming ^{[2
]}

机构：

[1] Fudan Univ, Shanghai Collaborat Innovat Ctr Intelligent Visua, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China

[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

single image super-resolution; text-guided super-resolution; multi-modal fusion learning;

D O I：

10.1145/3503161.3547951

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Deep neural networks have greatly promoted the performance of single image super-resolution (SISR). Conventional methods still resort to restoring the single high-resolution (HR) solution only based on the input of image modality. However, the image-level information is insufficient to predict adequate details and photo-realistic visual quality facing large upscaling factors (x8, x16). In this paper, we propose a new perspective that regards the SISR as a semantic image detail enhancement problem to generate semantically reasonable HR image that are faithful to the ground truth. To enhance the semantic accuracy and the visual quality of the reconstructed image, we explore the multi-modal fusion learning in SISR by proposing a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities. Different from existing methods, the proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process. Extensive experiments and ablation studies demonstrate the effect of the TGSR, which exploits the text reference to recover realistic images.

引用

页码：3461 / 3469

页数：9

共 50 条

[1] Text Prior Guided Scene Text Image Super-Resolution
Ma, Jianqi
Guo, Shi
Zhang, Lei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1341 - 1353
[2] Text Image Super-Resolution Guided by Text Structure and Embedding Priors
Huang, Cong
Peng, Xiulian
Liu, Dong
Lu, Yan
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
[3] Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer
Shi, Qin
Zhu, Yu
Liu, Yatong
Ye, Jiongyao
Yang, Dawei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
[4] A Text-Guided Generation and Refinement Model for Image Captioning
Wang, Depeng
Hu, Zhenzhen
Zhou, Yuanen
Hong, Richang
Wang, Meng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
[5] CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
Xu, Sihan
Ma, Ziqiao
Huang, Yidong
Lee, Honglak
Chai, Joyce
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Text-Guided Molecule Generation with Diffusion Language Model
Gong, Haisong
Liu, Qiang
Wu, Shu
Wang, Liang
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 109 - 117
[7] GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution
Kong, Yuxin
Ma, Weihong
Jin, Lianwen
Xue, Yang
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 196 - 214
[8] Rethinking Alignment in Video Super-Resolution Transformers
Shi, Shuwei
Gu, Jinjin
Xie, Liangbin
Wang, Xintao
Yang, Yujiu
Dong, Chao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] RETHINKING SUPER-RESOLUTION: THE BANDWIDTH SELECTION PROBLEM
Batenkov, Dmitry
Bhandari, Ayush
Blu, Thierry
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5087 - 5091
[10] Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Le, Matthew
Vyas, Apoorv
Shi, Bowen
Karrer, Brian
Sari, Leda
Moritz, Rashel
Williamson, Mary
Manohar, Vimal
Adi, Yossi
Mahadeokar, Jay
Hsu, Wei-Ning
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →