Video Super-Resolution via Bidirectional Recurrent Convolutional Networks

被引:175
作者
Huang, Yan [1 ,2 ]
Wang, Wei [1 ,2 ]
Wang, Liang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, NLPR, Ctr Res Intelligent Percept & Comp CRIPAC, Beijing 100049, Peoples R China
[2] UCAS, Beijing 100049, Peoples R China
[3] Chinese Acad Sci CASIA, Inst Automat, CEBSIT, Beijing 100864, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Deep learning; recurrent neural networks; 3D convolution; video super-resolution; LEARNING ALGORITHM; RESOLUTION;
D O I
10.1109/TPAMI.2017.2701380
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Super resolving a low-resolution video, namely video super-resolution (SR), is usually handled by either single-image SR or multi-frame SR. Single-Image SR deals with each video frame independently, and ignores intrinsic temporal dependency of video frames which actually plays a very important role in video SR. Multi-Frame SR generally extracts motion information, e.g., optical flow, to model the temporal dependency, but often shows high computational cost. Considering that recurrent neural networks (RNNs) can model long-term temporal dependency of video sequences well, we propose a fully convolutional RNN named bidirectional recurrent convolutional network for efficient multi-frame SR. Different from vanilla RNNs, 1) the commonly-used full feedforward and recurrent connections are replaced with weight-sharing convolutional connections. So they can greatly reduce the large number of network parameters and well model the temporal dependency in a finer level, i.e., patch-based rather than frame-based, and 2) connections from input layers at previous timesteps to the current hidden layer are added by 3D feedforward convolutions, which aim to capture discriminate spatio-temporal patterns for short-term fast-varying motions in local adjacent frames. Due to the cheap convolutional operations, our model has a low computational complexity and runs orders of magnitude faster than other multi-frame SR methods. With the powerful temporal dependency modeling, our model can super resolve videos with complex motions and achieve well performance.
引用
收藏
页码:1015 / 1028
页数:14
相关论文
共 51 条
[1]  
[Anonymous], 2015, ADV NEURAL INFPROCES
[2]  
[Anonymous], 2014, NEURAL INFORM PROCES
[3]  
[Anonymous], ADV NEURAL INFORM PR
[4]  
[Anonymous], 2007, International Conference on Artificial Intelligence and Statistics
[5]  
Baker S., 1999, SUPER RESOLUTION OPT, P99
[6]  
Bascle B., 1996, Computer Vision - ECCV '96. 4th Eurpean Conference on Computer Proceedings, P573
[7]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[8]   Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding [J].
Bevilacqua, Marco ;
Roumy, Aline ;
Guillemot, Christine ;
Morel, Marie-Line Alberi .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[9]   High accuracy optical flow estimation based on a theory for warping [J].
Brox, T ;
Bruhn, A ;
Papenberg, N ;
Weickert, J .
COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 :25-36
[10]   Super-resolution through neighbor embedding [J].
Chang, H ;
Yeung, DY ;
Xiong, Y .
PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, 2004, :275-282