An Efficient Multi-Scale Attention Feature Fusion Network for 4K Video Frame Interpolation

被引：1

作者：

Ning, Xin ^{[1
]}

Li, Yuhang ^{[1
]}

Feng, Ziwei ^{[1
]}

Liu, Jinhua ^{[1
]}

Ding, Youdong ^{[1
,2
]}

机构：

[1] Shanghai Univ, Coll Shanghai Film, 788 Guangzhong Rd, Shanghai 200072, Peoples R China

[2] Shanghai Engn Res Ctr Mot Picture Special Effects, 788 Guangzhong Rd, Shanghai 200072, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 06期

基金：

中国国家自然科学基金;

关键词：

4K video frame interpolation; 4K video dataset; self-attention; multi-scale; high frame rate;

D O I：

10.3390/electronics13061037

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video frame interpolation aims to generate intermediate frames in a video to showcase finer details. However, most methods are only trained and tested on low-resolution datasets, lacking research on 4K video frame interpolation problems. This limitation makes it challenging to handle high-frame-rate video processing in real-world scenarios. In this paper, we propose a 4K video dataset at 120 fps, named UHD4K120FPS, which contains large motion. We also propose a novel framework for solving the 4K video frame interpolation task, based on a multi-scale pyramid network structure. We introduce self-attention to capture long-range dependencies and self-similarities in pixel space, which overcomes the limitations of convolutional operations. To reduce computational cost, we use a simple mapping-based approach to lighten self-attention, while still allowing for content-aware aggregation weights. Through extensive quantitative and qualitative experiments, we demonstrate the excellent performance achieved by our proposed model on the UHD4K120FPS dataset, as well as illustrate the effectiveness of our method for 4K video frame interpolation. In addition, we evaluate the robustness of the model on low-resolution benchmark datasets.

引用

页数：16

共 33 条

[1] A Fast 4K Video Frame Interpolation Using a Hybrid Task-Based Convolutional Neural Network
Ahn, Ha-Eun
Jeong, Jinwoo
Kim, Je Woo
[J]. SYMMETRY-BASEL, 2019, 11 (05):
[2] A Database and Evaluation Methodology for Optical Flow
Baker, Simon
Scharstein, Daniel
Lewis, J. P.
Roth, Stefan
Black, Michael J.
Szeliski, Richard
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2011, 92 (01) : 1 - 31
[3] Depth-Aware Video Frame Interpolation
Bao, Wenbo
Lai, Wei-Sheng
Ma, Chao
Zhang, Xiaoyun
Gao, Zhiyong
Yang, Ming-Hsuan
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3698 - 3707
[4] Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution
Cheng, Xianhang
Chen, Zhenzhong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7029 - 7045
[5] Choi M, 2020, AAAI CONF ARTIF INTE, V34, P10663
[6] CDFI: Compression-Driven Network Design for Frame Interpolation
Ding, Tianyu
Liang, Luming
Zhu, Zhihui
Zharkov, Ilya
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7997 - 8007
[7] Space-Time-Aware Multi-Resolution Video Enhancement
Haris, Muhammad
Shakhnarovich, Greg
Ukita, Norimichi
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2856 - 2865
[8] Real-Time Intermediate Flow Estimation for Video Frame Interpolation
Huang, Zhewei
Zhang, Tianyuan
Heng, Wen
Shi, Boxin
Zhou, Shuchang
[J]. COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 624 - 642
[9] Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
Jiang, Huaizu
Sun, Deqing
Jampani, Varun
Yang, Ming-Hsuan
Learned-Miller, Erik
Kautz, Jan
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9000 - 9008
[10] Junheum Park, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12359), P109, DOI 10.1007/978-3-030-58568-6_7

← 1 2 3 4 →