Progressive Spatial-temporal Collaborative Network for Video Frame Interpolation

被引:14
作者
Hu, Mengshun [1 ]
Jiang, Kui [1 ]
Liao, Liang [2 ]
Nie, Zhixiang [1 ]
Xiao, Jing [1 ]
Wang, Zheng [1 ]
机构
[1] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Ctr Multimedia Software, Natl Engn Res,Sch Comp Sci, Wuhan, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Video frame interpolation; Collaborative network; Content-guided motion; Motion-guided content; Multi-scale;
D O I
10.1145/3503161.3547875
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most video frame interpolation (VFI) algorithms infer the intermediate frame with the help of adjacent frames through the cascaded motion estimation and content refinement.However, the intrinsic correlations between motion and content are barely investigated, commonly producing interpolated results with inconsistency and blurry contents.Specifically, we first discover a simple yet essential domain knowledge that contents and motions characteristics should be homogeneous to a certain degree from the same objects, and formulate the consistency into the loss function for model optimization. Based on this, we propose to learn the collaborative representation between motions and contents, and construct a novel progressive spatial-temporal Collaborative network (Prost-Net) for video frame interpolation. Specifically, we develop a content-guided motion module (CGMM) and a motion-guided content module (MGCM) for individual content and motion representation. In particular, the predicted motion in CGMM is used to guide the fusion and distillation of contents for intermediate frame interpolation, and vice versa. Furthermore, by considering collaborative strategy in a multi-scale framework, our Prost-Net progressively optimizes motions and contents in a coarse-to-fine manner, making it robust to various challenging scenarios (e.g., occlusion and large motions) in VFI. Extensive experiments on the benchmark datasets demonstrate that our method significantly outperforms state-of-the-art methods.
引用
收藏
页码:2145 / 2153
页数:9
相关论文
共 55 条
[11]   Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation [J].
Choi, Byeong-Doo ;
Han, Jong-Woo ;
Kim, Chang-Su ;
Ko, Sung-Jea .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2007, 17 (04) :407-416
[12]   Motion-Aware Dynamic Architecture for Efficient Frame Interpolation [J].
Choi, Myungsub ;
Lee, Suyoung ;
Kim, Heewon ;
Lee, Kyoung Mu .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13819-13828
[13]  
Choi M, 2020, AAAI CONF ARTIF INTE, V34, P10663
[14]   FeatureFlow: Robust Video Interpolation via Structure-to-texture Generation [J].
Gui, Shurui ;
Wang, Chaoyue ;
Chen, Qihua ;
Tao, Dacheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14001-14010
[15]   Feature-based motion compensated interpolation for frame rate up-conversion [J].
Guo, Dabo ;
Shao, Ling ;
Han, Jungong .
NEUROCOMPUTING, 2014, 123 :390-397
[16]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[17]   Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning [J].
Hu, Mengshun ;
Jiang, Kui ;
Liao, Liang ;
Xiao, Jing ;
Jiang, Junjun ;
Wang, Zheng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3564-3573
[18]  
Hu MS, 2020, INT CONF ACOUST SPEE, P4347, DOI [10.1109/ICASSP40776.2020.9053223, 10.1109/icassp40776.2020.9053223]
[19]  
Hu Mengshun, 2021, IEEE T CIRCUITS SYST
[20]   Full-Duplex Strategy for Video Object Segmentation [J].
Ji, Ge-Peng ;
Fu, Keren ;
Wu, Zhe ;
Fan, Deng-Ping ;
Shen, Jianbing ;
Shao, Ling .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :4902-4913