VRT: A Video Restoration Transformer

被引：31

作者：

Liang, Jingyun ^{[1
]}

Cao, Jiezhang ^{[1
]}

Fan, Yuchen ^{[2
]}

Zhang, Kai ^{[1
,3
]}

Ranjan, Rakesh ^{[2
]}

Li, Yawei ^{[1
]}

Timofte, Radu ^{[1
]}

Van Gool, Luc ^{[1
,4
]}

机构：

[1] Swiss Fed Inst Technol, D ITET, Comp Vis Lab, CH-8092 Zurich, Switzerland

[2] Meta Inc, Menlo Pk, CA 94025 USA

[3] Nanjing Univ, Sch Intelligence Sci & Technol, Suzhou Campus, Suzhou 215163, Peoples R China

[4] Katholieke Univ Leuven, Dept Elect Engn, Proc Speechand Images PSI, B-3001 Leuven, Belgium

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

关键词：

Video restoration; video super-resolution; video deblurring; video denoising; video frame interpolation; spacetime video super-resolution; ENHANCEMENT; IMAGE;

D O I：

10.1109/TIP.2024.3372454

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video restoration aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which are restricted by frame-by-frame restoration. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction ability. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal reciprocal self attention (TRSA) and parallel warping. TRSA divides the video into small clips, on which reciprocal attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on five tasks, including video super-resolution, video deblurring, video denoising, video frame interpolation and space-time video super-resolution, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16dB) on fourteen benchmark datasets. The codes are available at https://github.com/JingyunLiang/VRT.

引用

页码：2171 / 2182

页数：12

共 77 条

[11] FlowNet: Learning Optical Flow with Convolutional Networks
Dosovitskiy, Alexey
Fischer, Philipp
Ilg, Eddy
Haeusser, Philip
Hazirbas, Caner
Golkov, Vladimir
van der Smagt, Patrick
Cremers, Daniel
Brox, Thomas
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2758 - 2766
[12] Efficient Video Super-Resolution through Recurrent Latent Space Propagation
Fuoli, Dario
Gu, Shuhang
Timofte, Radu
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3476 - 3485
[13] RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution
Geng, Zhicheng
Liang, Luming
Ding, Tianyu
Zharkov, Ilya
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17420 - 17430
[14] Space-Time-Aware Multi-Resolution Video Enhancement
Haris, Muhammad
Shakhnarovich, Greg
Ukita, Norimichi
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2856 - 2865
[15] Recurrent Back-Projection Network for Video Super-Resolution
Haris, Muhammad
Shakhnarovich, Greg
Ukita, Norimichi
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3892 - 3901
[16] Huang Y, 2015, ADV NEUR IN, V28
[17] Isobe Takashi, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P645, DOI 10.1007/978-3-030-58610-2_38
[18] Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
Jiang, Huaizu
Sun, Deqing
Jampani, Varun
Yang, Ming-Hsuan
Learned-Miller, Erik
Kautz, Jan
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9000 - 9008
[19] Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation
Jo, Younghyun
Oh, Seoung Wug
Kang, Jaeyeon
Kim, Seon Joo
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3224 - 3232
[20] Kalluri T., 2020, arXiv

← 1 2 3 4 5 6 7 8 →