Deep Learning-Based Image and Video Inpainting: A Survey

被引:10
作者
Quan, Weize [1 ,2 ]
Chen, Jiaxi [1 ,2 ]
Liu, Yanli [3 ]
Yan, Dong-Ming [1 ,2 ]
Wonka, Peter [4 ]
机构
[1] Chinese Acad Sci, Inst Automat, MAIS, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[4] King Abdullah Univ Sci & Technol, Comp Elect & Math Sci & Engn Div, Thuwal, Saudi Arabia
基金
中国国家自然科学基金;
关键词
Image inpainting; Video inpainting; Deep learning; Content generation; CROWDED SCENES; PEOPLE; NUMBER; SCALE;
D O I
10.1007/s11263-023-01977-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image and video inpainting is a classic problem in computer vision and computer graphics, aiming to fill in the plausible and realistic content in the missing areas of images and videos. With the advance of deep learning, this problem has achieved significant progress recently. The goal of this paper is to comprehensively review the deep learning-based methods for image and video inpainting. Specifically, we sort existing methods into different categories from the perspective of their high-level inpainting pipeline, present different deep learning architectures, including CNN, VAE, GAN, diffusion models, etc., and summarize techniques for module design. We review the training objectives and the common benchmark datasets. We present evaluation metrics for low-level pixel and high-level perceptional similarity, conduct a performance evaluation, and discuss the strengths and weaknesses of representative inpainting methods. We also discuss related real-world applications. Finally, we discuss open challenges and suggest potential future research directions.
引用
收藏
页码:2367 / 2400
页数:34
相关论文
共 274 条
  • [71] Deep Fusion Network for Image Completion
    Hong, Xin
    Xiong, Pengfei
    Ji, Renhe
    Fan, Haoqiang
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2033 - 2042
  • [72] Hongyu Liu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12347), P725, DOI 10.1007/978-3-030-58536-5_43
  • [73] Hoogeboom E, 2021, 35 C NEURAL INFORM P, V34
  • [74] Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications
    Houle, Michael E.
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2017, 2017, 10609 : 64 - 79
  • [75] Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional
    Houle, Michael E.
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2017, 2017, 10609 : 80 - 95
  • [76] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
  • [77] Image Completion using Planar Structure Guidance
    Huang, Jia-Bin
    Kang, Sing Bing
    Ahuja, Narendra
    Kopf, Johannes
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04):
  • [78] Temporally Coherent Completion of Dynamic Video
    Huang, Jia-Bin
    Kang, Sing Bing
    Ahuja, Narendra
    Kopf, Johannes
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (06):
  • [79] Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
    Huang, Xun
    Belongie, Serge
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1510 - 1519
  • [80] Hui Z, 2020, Arxiv, DOI arXiv:2002.02609