Advancing Real-World Stereoscopic Image Super-Resolution via Vision-Language Model

被引：0

作者：

Zhang, Zhe ^{[1
,2
]}

Lei, Jianjun ^{[1
]}

Peng, Bo ^{[1
]}

Zhu, Jie ^{[1
]}

Xu, Liying ^{[1
]}

Huang, Qingming ^{[3
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Tianjin Univ Commerce, Sch Informat Engn, Tianjin 300134, Peoples R China

[3] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2025年 / 34卷

基金：

中国国家自然科学基金;

关键词：

Stereo image processing; Degradation; Superresolution; Visualization; Image reconstruction; Training; Iterative methods; Solid modeling; Computational modeling; Cognition; Super-resolution; stereoscopic image; vision-language model;

D O I：

10.1109/TIP.2025.3546470

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent years have witnessed the remarkable success of the vision-language model in various computer vision tasks. However, how to exploit the semantic language knowledge of the vision-language model to advance real-world stereoscopic image super-resolution remains a challenging problem. This paper proposes a vision-language model-based stereoscopic image super-resolution (VLM-SSR) method, in which the semantic language knowledge in CLIP is exploited to facilitate stereoscopic image SR in a training-free manner. Specifically, by designing visual prompts for CLIP to infer the region similarity, a prompt-guided information aggregation mechanism is presented to capture inter-view information among relevant regions between the left and right views. Besides, driven by the prior knowledge of CLIP, a cognition prior-driven iterative enhancing mechanism is presented to optimize fuzzy regions adaptively. Experimental results on four datasets verify the effectiveness of the proposed method.

引用

页码：2187 / 2197

页数：11

共 50 条

[41] EFRG-SRGAN: combining augmented features for real-world super-resolution
Yao, Yibing
Cui, Zhisheng
Wang, Dakai
Zhang, Miaohui
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5173 - 5187
[42] Arbitrary-Scale Image Super-Resolution via Degradation Perception
Wan, Wenbo
Wang, Zezhu
Wang, Zhiyan
Gu, Lingchen
Sun, Jiande
Wang, Qiang
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2024, 10 : 666 - 676
[43] A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution
Meng, Fanen
Chen, Yijun
Jing, Haoyu
Zhang, Laifu
Yan, Yiming
Ren, Yingchao
Wu, Sensen
Feng, Tian
Liu, Renyi
Du, Zhenhong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[44] Single Image Super-Resolution with Vision Loss Function
Song, Yi-Zhen
Liu, Wen-Yen
Chen, Ju-Chin
Lin, Kawuu W.
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 173 - 179
[45] CVIformer: Cross-View Interactive Transformer for Efficient Stereoscopic Image Super-Resolution
Zhang, Dongyang
Liang, Shuang
He, Tao
Shao, Jie
Qin, Ke
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 1107 - 1118
[46] MambaFormerSR: A Lightweight Model for Remote-Sensing Image Super-Resolution
Zhi, Ruicong
Fan, Xiaopei
Shi, Jingye
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[47] Image Super-Resolution Algorithm Based on RRDB Model
Li, Huan
IEEE ACCESS, 2021, 9 : 156260 - 156273
[48] Infrared Image Super-Resolution via Transfer Learning and PSRGAN
Huang, Yongsong
Jiang, Zetao
Lan, Rushi
Zhang, Shaoqin
Pi, Kui
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 982 - 986
[49] Image Super-Resolution Via Sparse Embedding
Zhu, Qidan
Sun, Lei
Cai, Chengtao
2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 5673 - 5676
[50] Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution
Huang, Detian
Song, Jiaxun
Huang, Xiaoqian
Hu, Zhenzhen
Zeng, Huanqiang
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 316 - 320

← 1 2 3 4 5 →