Exemplar-based video colorization with long-term spatiotemporal dependency

被引：3

作者：

Chen, Siqi ^{[1
]}

Li, Xueming ^{[2
]}

Zhang, Xianlin ^{[2
]}

Wang, Mingdao ^{[1
]}

Zhang, Yu ^{[1
]}

Han, Jiatong ^{[1
]}

Zhang, Yue ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 102206, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Arts, Beijing 102206, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 284卷

关键词：

Video colorization; Exemplar-based; Moving scenes; Long-term dependency; Spatiotemporal; IMAGE COLORIZATION; ATTENTION;

D O I：

10.1016/j.knosys.2023.111240

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Exemplar-based video colorization is an essential technique for applications like old movie restoration. Although recent methods perform well in still scenes or scenes with regular movement, they always lack robustness in moving scenes due to their weak ability to model long-term dependency both spatially and temporally, leading to color fading, color discontinuity, or other artifacts. To solve this problem, we propose an exemplar-based video colorization framework with long-term spatiotemporal dependency. To enhance the long-term spatial dependency, a parallelized CNN-Transformer block and a double-head non-local operation are designed. The proposed CNN-Transformer block can better incorporate the long-term spatial dependency with local texture and structural features, and the double-head non-local operation further exploits the performance of the augmented feature. While for the long-term temporal dependency enhancement, we further introduce the novel Linkage subnet. The Linkage subnet propagates motion information across adjacent frame blocks and helps to maintain temporal continuity. Experiments demonstrate that our model outperforms recent state -of-the-art methods both quantitatively and qualitatively. Also, our model can generate more colorful, realistic and stabilized results, especially for scenes where objects change greatly and irregularly.

引用

页数：12

共 70 条

[1] Abu-El-Haija S., 2016, PREPRINT, DOI DOI 10.48550/ARXIV.1609.08675
[2] Attention Augmented Convolutional Networks
Bello, Irwan
Zoph, Barret
Vaswani, Ashish
Shlens, Jonathon
Le, Quoc V.
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
[3] Ben-Zrihem N, 2015, PROC CVPR IEEE, P5233, DOI 10.1109/CVPR.2015.7299160
[4] Blind Video Temporal Consistency
Bonneel, Nicolas
Tompkin, James
Sunkavalli, Kalyan
Sun, Deqing
Paris, Sylvain
Pfister, Hanspeter
[J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
[5] Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN
Chang, Ya-Liang
Liu, Zhe Yu
Lee, Kuan-Ying
Hsu, Winston
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9065 - 9074
[6] Mobile-Former: Bridging MobileNet and Transformer
Chen, Yinpeng
Dai, Xiyang
Chen, Dongdong
Liu, Mengchen
Dong, Xiaoyi
Yuan, Lu
Liu, Zicheng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269
[7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9] Measuring colourfulness in natural images
Hasler, D
Süsstrunk, S
[J]. HUMAN VISION AND ELECTRONIC IMAGING VIII, 2003, 5007 : 87 - 95
[10] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778

← 1 2 3 4 5 6 7 →