Rethinking Super-Resolution as Text-Guided Details Generation

被引:0
|
作者
Ma, Chenxi [1 ]
Yan, Bo [1 ]
Lin, Qing [1 ]
Tan, Weimin [1 ]
Chen, Siming [2 ]
机构
[1] Fudan Univ, Shanghai Collaborat Innovat Ctr Intelligent Visua, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
single image super-resolution; text-guided super-resolution; multi-modal fusion learning;
D O I
10.1145/3503161.3547951
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep neural networks have greatly promoted the performance of single image super-resolution (SISR). Conventional methods still resort to restoring the single high-resolution (HR) solution only based on the input of image modality. However, the image-level information is insufficient to predict adequate details and photo-realistic visual quality facing large upscaling factors (x8, x16). In this paper, we propose a new perspective that regards the SISR as a semantic image detail enhancement problem to generate semantically reasonable HR image that are faithful to the ground truth. To enhance the semantic accuracy and the visual quality of the reconstructed image, we explore the multi-modal fusion learning in SISR by proposing a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities. Different from existing methods, the proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process. Extensive experiments and ablation studies demonstrate the effect of the TGSR, which exploits the text reference to recover realistic images.
引用
收藏
页码:3461 / 3469
页数:9
相关论文
共 50 条
  • [1] Text Prior Guided Scene Text Image Super-Resolution
    Ma, Jianqi
    Guo, Shi
    Zhang, Lei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1341 - 1353
  • [2] Text Image Super-Resolution Guided by Text Structure and Embedding Priors
    Huang, Cong
    Peng, Xiulian
    Liu, Dong
    Lu, Yan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [3] Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer
    Shi, Qin
    Zhu, Yu
    Liu, Yatong
    Ye, Jiongyao
    Yang, Dawei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [4] A Text-Guided Generation and Refinement Model for Image Captioning
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Hong, Richang
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2966 - 2977
  • [5] CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
    Xu, Sihan
    Ma, Ziqiao
    Huang, Yidong
    Lee, Honglak
    Chai, Joyce
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Text-Guided Molecule Generation with Diffusion Language Model
    Gong, Haisong
    Liu, Qiang
    Wu, Shu
    Wang, Liang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 109 - 117
  • [7] GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution
    Kong, Yuxin
    Ma, Weihong
    Jin, Lianwen
    Xue, Yang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 196 - 214
  • [8] Rethinking Alignment in Video Super-Resolution Transformers
    Shi, Shuwei
    Gu, Jinjin
    Xie, Liangbin
    Wang, Xintao
    Yang, Yujiu
    Dong, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] RETHINKING SUPER-RESOLUTION: THE BANDWIDTH SELECTION PROBLEM
    Batenkov, Dmitry
    Bhandari, Ayush
    Blu, Thierry
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5087 - 5091
  • [10] Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
    Le, Matthew
    Vyas, Apoorv
    Shi, Bowen
    Karrer, Brian
    Sari, Leda
    Moritz, Rashel
    Williamson, Mary
    Manohar, Vimal
    Adi, Yossi
    Mahadeokar, Jay
    Hsu, Wei-Ning
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,