Video Super-Resolution via Bidirectional Recurrent Convolutional Networks

被引：175

作者：

Huang, Yan ^{[1
,2
]}

Wang, Wei ^{[1
,2
]}

Wang, Liang ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci CASIA, Inst Automat, NLPR, Ctr Res Intelligent Percept & Comp CRIPAC, Beijing 100049, Peoples R China

[2] UCAS, Beijing 100049, Peoples R China

[3] Chinese Acad Sci CASIA, Inst Automat, CEBSIT, Beijing 100864, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2018年 / 40卷 / 04期

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Deep learning; recurrent neural networks; 3D convolution; video super-resolution; LEARNING ALGORITHM; RESOLUTION;

D O I：

10.1109/TPAMI.2017.2701380

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Super resolving a low-resolution video, namely video super-resolution (SR), is usually handled by either single-image SR or multi-frame SR. Single-Image SR deals with each video frame independently, and ignores intrinsic temporal dependency of video frames which actually plays a very important role in video SR. Multi-Frame SR generally extracts motion information, e.g., optical flow, to model the temporal dependency, but often shows high computational cost. Considering that recurrent neural networks (RNNs) can model long-term temporal dependency of video sequences well, we propose a fully convolutional RNN named bidirectional recurrent convolutional network for efficient multi-frame SR. Different from vanilla RNNs, 1) the commonly-used full feedforward and recurrent connections are replaced with weight-sharing convolutional connections. So they can greatly reduce the large number of network parameters and well model the temporal dependency in a finer level, i.e., patch-based rather than frame-based, and 2) connections from input layers at previous timesteps to the current hidden layer are added by 3D feedforward convolutions, which aim to capture discriminate spatio-temporal patterns for short-term fast-varying motions in local adjacent frames. Due to the cheap convolutional operations, our model has a low computational complexity and runs orders of magnitude faster than other multi-frame SR methods. With the powerful temporal dependency modeling, our model can super resolve videos with complex motions and achieve well performance.

引用

页码：1015 / 1028

页数：14

共 51 条

[1]

[Anonymous], 2015, ADV NEURAL INFPROCES

[2]

[Anonymous], 2014, NEURAL INFORM PROCES

[3]

[Anonymous], ADV NEURAL INFORM PR

[4]

[Anonymous], 2007, International Conference on Artificial Intelligence and Statistics

[5]

Baker S., 1999, SUPER RESOLUTION OPT, P99

[6]

Bascle B., 1996, Computer Vision - ECCV '96. 4th Eurpean Conference on Computer Proceedings, P573

[7] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].

BENGIO, Y ;

SIMARD, P ;

FRASCONI, P .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166

[8] Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding [J].

Bevilacqua, Marco ;

Roumy, Aline ;

Guillemot, Christine ;

Morel, Marie-Line Alberi .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,

[9] High accuracy optical flow estimation based on a theory for warping [J].

Brox, T ;

Bruhn, A ;

Papenberg, N ;

Weickert, J .

COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 :25-36

[10] Super-resolution through neighbor embedding [J].

Chang, H ;

Yeung, DY ;

Xiong, Y .

PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, 2004, :275-282

← 1 2 3 4 5 6 →