iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks

被引:16
作者
Chadha, Aman [1 ]
Britto, John [2 ]
Roja, M. Mani [3 ]
机构
[1] Stanford Univ, Dept Comp Sci, 450 Serra Mall, Stanford, CA 94305 USA
[2] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[3] Univ Mumbai, Dept Elect & Telecommun Engn, Mumbai 400032, Maharashtra, India
关键词
super resolution; video upscaling; frame recurrence; optical flow; generative adversarial networks; convolutional neural networks; IMAGE; RESOLUTION;
D O I
10.1007/s41095-020-0175-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, learning-based models have enhanced the performance of single-image super-resolution (SISR). However, applying SISR successively to each video frame leads to a lack of temporal coherency. Convolutional neural networks (CNNs) outperform traditional approaches in terms of image quality metrics such as peak signal to noise ratio (PSNR) and structural similarity (SSIM). On the other hand, generative adversarial networks (GANs) offer a competitive advantage by being able to mitigate the issue of a lack of finer texture details, usually seen with CNNs when super-resolving at large upscaling factors. We present iSeeBetter, a novel GAN-based spatio-temporal approach to video super-resolution (VSR) that renders temporally consistent super-resolution videos. iSeeBetter extracts spatial and temporal information from the current and neighboring frames using the concept of recurrent back-projection networks as its generator. Furthermore, to improve the "naturality" of the super-resolved output while eliminating artifacts seen with traditional algorithms, we utilize the discriminator from super-resolution generative adversarial network. Although mean squared error (MSE) as a primary loss-minimization objective improves PSNR/SSIM, these metrics may not capture fine details in the image resulting in misrepresentation of perceptual quality. To address this, we use a four-fold (MSE, perceptual, adversarial, and total-variation loss function. Our results demonstrate that iSeeBetter offers superior VSR fidelity and surpasses state-of-the-art performance.
引用
收藏
页码:307 / 317
页数:11
相关论文
共 49 条
  • [1] Image up-sampling using total-variation regularization with a new observation model
    Aly, HA
    Dubois, E
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2005, 14 (10) : 1647 - 1659
  • [2] [Anonymous], 2014, ARXIV14091556
  • [3] [Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
  • [4] Bruna J., 2016, P 4 INT C LEARN REPR
  • [5] Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation
    Caballero, Jose
    Ledig, Christian
    Aitken, Andrew
    Acosta, Alejandro
    Totz, Johannes
    Wang, Zehan
    Shi, Wenzhe
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2848 - 2857
  • [6] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
  • [7] Image Super-Resolution Using Deep Convolutional Networks
    Dong, Chao
    Loy, Chen Change
    He, Kaiming
    Tang, Xiaoou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) : 295 - 307
  • [8] Dosovitskiy Alexey, 2016, Advances in Neural Information Processing Systems, V29
  • [9] Drulea M, 2011, IEEE INT C INTELL TR, P318, DOI 10.1109/ITSC.2011.6082986
  • [10] Unified Blind Method for Multi-Image Super-Resolution and Single/Multi-Image Blur Deconvolution
    Faramarzi, Esmaeil
    Rajan, Dinesh
    Christensen, Marc P.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (06) : 2101 - 2114