ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

被引:0
作者
Yang, Yixin [1 ]
Dong, Jiangxin [1 ]
Tang, Jinhui [1 ]
Pan, Jinshan [1 ]
机构
[1] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
来源
COMPUTER VISION-ECCV 2024, PT IV | 2025年 / 15062卷
基金
中国国家自然科学基金;
关键词
Exemplar-based video colorization; Deep convolutional neural network; Feature propagation; IMAGE;
D O I
10.1007/978-3-031-73235-5_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to effectively explore spatial-temporal features is important for video colorization. Instead of stacking multiple frames along the temporal dimension or recurrently propagating estimated features that will accumulate errors or cannot explore information from far-apart frames, we develop a memory-based feature propagation module that can establish reliable connections with features from far-apart frames and alleviate the influence of inaccurately estimated features. To extract better features from each frame for the above-mentioned feature propagation, we explore the features from large-pretrained visual models to guide the feature estimation of each frame so that the estimated features can model complex scenarios. In addition, we note that adjacent frames usually contain similar contents. To explore this property for better spatial and temporal feature utilization, we develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood. We formulate our memory-based feature propagation module, large-pretrained visual model guided feature estimation module, and local attention module into an end-to-end trainable network (named ColorMNet) and show that it performs favorably against state-of-the-art methods on both the benchmark datasets and real-world scenarios. Our source codes and pre-trained models are available at: https://github.com/yyang181/colormnet.
引用
收藏
页码:336 / 352
页数:17
相关论文
共 45 条
  • [1] Chen SQ, 2023, Arxiv, DOI arXiv:2303.15081
  • [2] Schelling Points on 3D Surface Meshes
    Chen, Xiaobai
    Saparov, Abulhair
    Pang, Bill
    Funkhouser, Thomas
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (04):
  • [3] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
    Cheng, Ho Kei
    Schwing, Alexander G.
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 640 - 658
  • [4] Deep Colorization
    Cheng, Zezhou
    Yang, Qingxiong
    Sheng, Bin
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 415 - 423
  • [5] Darcet T, 2024, DINOv2: learning robust visual features without supervision
  • [6] Dosovitskiy A, 2021, P 9 INT C LEARNING R
  • [7] Hasler D., 2023, Human Vision and Electronic Imaging, VVIII
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Deep Exemplar-based Colorization
    He, Mingming
    Chen, Dongdong
    Liao, Jing
    Sander, Pedro, V
    Yuan, Lu
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [10] Heusel M, 2017, ADV NEUR IN, V30