VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution

被引：0

作者：

Liu, Linlin ^{[1
]}

Niu, Lele ^{[1
]}

Tang, Jun ^{[1
]}

Ding, Yong ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Integrated Circuits, Hangzhou 310000, Peoples R China

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Diffusion models; Image reconstruction; Visualization; Superresolution; Noise reduction; Coherence; Noise; Distortion; Feature extraction; Convolution; Video super-resolution; diffusion models; denoising diffusion probabilistic models; deep learning; convolutional neural network; ENHANCEMENT;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at https://github.com/aigcvsr/VSRDiff.

引用

页码：11447 / 11462

页数：16

共 72 条

[1] SUPERVEGAN: Super Resolution Video Enhancement GAN for Perceptually Improving Low Bitrate Streams [J].

Andrei, Silviu S. ;

Shapovalova, Nataliya ;

Mayol-Cuevas, Walterio .

IEEE ACCESS, 2021, 9 :91160-91174

[2] Constant-roll in the Palatini-R2 models [J].

Antoniadis, Ignation ;

Lykkas, Angelos ;

Tamvakis, Kyriakos .

JOURNAL OF COSMOLOGY AND ASTROPARTICLE PHYSICS, 2020, (04)

[3] The Perception-Distortion Tradeoff [J].

Blau, Yochai ;

Michaeli, Tomer .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6228-6237

[4] Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation [J].

Caballero, Jose ;

Ledig, Christian ;

Aitken, Andrew ;

Acosta, Alejandro ;

Totz, Johannes ;

Wang, Zehan ;

Shi, Wenzhe .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2848-2857

[5] Towards Interpretable Video Super-Resolution via Alternating Optimization [J].

Cao, Jiezhang ;

Liang, Jingyun ;

Zhang, Kai ;

Wang, Wenguan ;

Wang, Qin ;

Zhang, Yulun ;

Tang, Hao ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 :393-411

[6] BasicVSR plus plus : Improving Video Super-Resolution with Enhanced Propagation and Alignment [J].

Chan, Kelvin C. K. ;

Zhou, Shangchen ;

Xu, Xiangyu ;

Loy, Chen Change .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5962-5971

[7] Investigating Tradeoffs in Real-World Video Super-Resolution [J].

Chan, Kelvin C. K. ;

Zhou, Shangchen ;

Xu, Xiangyu ;

Loy, Chen Change .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5952-5961

[8] BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond [J].

Chan, Kelvin C. K. ;

Wang, Xintao ;

Yu, Ke ;

Dong, Chao ;

Loy, Chen Change .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4945-4954

[9] Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [J].

Chen, Zhikai ;

Long, Fuchen ;

Qiu, Zhaofan ;

Yao, Ting ;

Zhou, Wengang ;

Luo, Jiebo ;

Mei, Tao .

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :9232-9241

[10] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

← 1 2 3 4 5 6 7 8 →