Consistent Video Inpainting Using Axial Attention-Based Style Transformer

被引:6
作者
Junayed, Masum Shah [1 ]
Islam, Md Baharul [1 ]
机构
[1] Bahcesehir Univ, Comp Engn, TR-34349 Istanbul, Turkiye
关键词
Index Terms-Video Inpainting; Deep Encoder; Axial Attention Block; Transformer; Style Manipulation Block; Relative Positional Encoding;
D O I
10.1109/TMM.2022.3222932
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Maintaining spatial and temporal consistency in the inpainted video area of the video is a challenging problem. Recent research focuses on flow information for synthesizing temporally smooth pixels while neglecting semantic structural coherence across the video frames. Thus, it suffers from over-smoothing and shadowy outlines that significantly degrade the inpainted video quality. We propose an end-to-end consistent video inpainting model that will substantially improve the inpainted video region to overcome this problem. The model employs a deep encoder (DE), axial attention block (AAB), style transformer, and decoder to enhance video inpainting with a realistic structure. A deep encoder (DE) encodes features effectively while the axial attention block (AAB) recreates all retrieved attributes by merging recoverable multi-scale characteristics with local spatial structures. Then, a novel-style transformer with the style manipulation block (SMB) fills the missing area with rich visual elements and temporal coherence. We use two publicly available benchmark datasets to assess the model's performance. Experimental results demonstrate that our method performs better than the state-of-the-art methods by a large margin. Besides, an extensive ablation study validates the model's performance.
引用
收藏
页码:7494 / 7504
页数:11
相关论文
共 52 条
[1]   PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing [J].
Barnes, Connelly ;
Shechtman, Eli ;
Finkelstein, Adam ;
Goldman, Dan B. .
ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03)
[2]   Simultaneous structure and texture image inpainting [J].
Bertalmio, M ;
Vese, L ;
Sapiro, G ;
Osher, S .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (08) :882-889
[3]  
Chang YL, 2019, Arxiv, DOI arXiv:1907.01131
[4]   Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN [J].
Chang, Ya-Liang ;
Liu, Zhe Yu ;
Lee, Kuan-Ying ;
Hsu, Winston .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9065-9074
[5]   Pre-Trained Image Processing Transformer [J].
Chen, Hanting ;
Wang, Yunhe ;
Guo, Tianyu ;
Xu, Chang ;
Deng, Yiping ;
Liu, Zhenhua ;
Ma, Siwei ;
Xu, Chunjing ;
Xu, Chao ;
Gao, Wen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305
[6]  
Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978
[7]   Image Melding: Combining Inconsistent Images using Patch-based Synthesis [J].
Darabi, Soheil ;
Shechtman, Eli ;
Barnes, Connelly ;
Goldman, Dan B. ;
Sen, Pradeep .
ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (04)
[8]   Complete and temporally consistent video outpainting [J].
Dehan, Loic ;
Van Ranst, Wiebe ;
Vandewalle, Patrick ;
Goedeme, Toon .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :686-694
[9]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[10]  
Efros A. A., 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision, P1033, DOI 10.1109/ICCV.1999.790383