VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution

被引:0
作者
Liu, Linlin [1 ]
Niu, Lele [1 ]
Tang, Jun [1 ]
Ding, Yong [1 ]
机构
[1] Zhejiang Univ, Coll Integrated Circuits, Hangzhou 310000, Peoples R China
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Diffusion models; Image reconstruction; Visualization; Superresolution; Noise reduction; Coherence; Noise; Distortion; Feature extraction; Convolution; Video super-resolution; diffusion models; denoising diffusion probabilistic models; deep learning; convolutional neural network; ENHANCEMENT;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at https://github.com/aigcvsr/VSRDiff.
引用
收藏
页码:11447 / 11462
页数:16
相关论文
共 72 条
  • [1] SUPERVEGAN: Super Resolution Video Enhancement GAN for Perceptually Improving Low Bitrate Streams
    Andrei, Silviu S.
    Shapovalova, Nataliya
    Mayol-Cuevas, Walterio
    [J]. IEEE ACCESS, 2021, 9 : 91160 - 91174
  • [2] Constant-roll in the Palatini-R2 models
    Antoniadis, Ignation
    Lykkas, Angelos
    Tamvakis, Kyriakos
    [J]. JOURNAL OF COSMOLOGY AND ASTROPARTICLE PHYSICS, 2020, (04):
  • [3] The Perception-Distortion Tradeoff
    Blau, Yochai
    Michaeli, Tomer
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6228 - 6237
  • [4] Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation
    Caballero, Jose
    Ledig, Christian
    Aitken, Andrew
    Acosta, Alejandro
    Totz, Johannes
    Wang, Zehan
    Shi, Wenzhe
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2848 - 2857
  • [5] Towards Interpretable Video Super-Resolution via Alternating Optimization
    Cao, Jiezhang
    Liang, Jingyun
    Zhang, Kai
    Wang, Wenguan
    Wang, Qin
    Zhang, Yulun
    Tang, Hao
    Van Gool, Luc
    [J]. COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 393 - 411
  • [6] BasicVSR plus plus : Improving Video Super-Resolution with Enhanced Propagation and Alignment
    Chan, Kelvin C. K.
    Zhou, Shangchen
    Xu, Xiangyu
    Loy, Chen Change
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5962 - 5971
  • [7] Investigating Tradeoffs in Real-World Video Super-Resolution
    Chan, Kelvin C. K.
    Zhou, Shangchen
    Xu, Xiangyu
    Loy, Chen Change
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5952 - 5961
  • [8] BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
    Chan, Kelvin C. K.
    Wang, Xintao
    Yu, Ke
    Dong, Chao
    Loy, Chen Change
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4945 - 4954
  • [9] Chen ZK, 2024, PROC CVPR IEEE, P9232, DOI 10.1109/CVPR52733.2024.00882
  • [10] Deformable Convolutional Networks
    Dai, Jifeng
    Qi, Haozhi
    Xiong, Yuwen
    Li, Yi
    Zhang, Guodong
    Hu, Han
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 764 - 773