ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

被引：0

作者：

Yang, Yixin ^{[1
]}

Dong, Jiangxin ^{[1
]}

Tang, Jinhui ^{[1
]}

Pan, Jinshan ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Nanjing, Peoples R China

来源：

COMPUTER VISION-ECCV 2024, PT IV | 2025年 / 15062卷

基金：

中国国家自然科学基金;

关键词：

Exemplar-based video colorization; Deep convolutional neural network; Feature propagation; IMAGE;

D O I：

10.1007/978-3-031-73235-5_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

How to effectively explore spatial-temporal features is important for video colorization. Instead of stacking multiple frames along the temporal dimension or recurrently propagating estimated features that will accumulate errors or cannot explore information from far-apart frames, we develop a memory-based feature propagation module that can establish reliable connections with features from far-apart frames and alleviate the influence of inaccurately estimated features. To extract better features from each frame for the above-mentioned feature propagation, we explore the features from large-pretrained visual models to guide the feature estimation of each frame so that the estimated features can model complex scenarios. In addition, we note that adjacent frames usually contain similar contents. To explore this property for better spatial and temporal feature utilization, we develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood. We formulate our memory-based feature propagation module, large-pretrained visual model guided feature estimation module, and local attention module into an end-to-end trainable network (named ColorMNet) and show that it performs favorably against state-of-the-art methods on both the benchmark datasets and real-world scenarios. Our source codes and pre-trained models are available at: https://github.com/yyang181/colormnet.

引用

页码：336 / 352

页数：17

共 45 条

[1] Chen SQ, 2023, Arxiv, DOI arXiv:2303.15081
[2] Schelling Points on 3D Surface Meshes
Chen, Xiaobai
Saparov, Abulhair
Pang, Bill
Funkhouser, Thomas
[J]. ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (04):
[3] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
Cheng, Ho Kei
Schwing, Alexander G.
[J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 640 - 658
[4] Deep Colorization
Cheng, Zezhou
Yang, Qingxiong
Sheng, Bin
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 415 - 423
[5] Darcet T, 2024, DINOv2: learning robust visual features without supervision
[6] Dosovitskiy A, 2021, P 9 INT C LEARNING R
[7] Hasler D., 2023, Human Vision and Electronic Imaging, VVIII
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] Deep Exemplar-based Colorization
He, Mingming
Chen, Dongdong
Liao, Jing
Sander, Pedro, V
Yuan, Lu
[J]. ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
[10] Heusel M, 2017, ADV NEUR IN, V30

← 1 2 3 4 5 →