Exemplar-based video colorization with long-term spatiotemporal dependency

被引:3
作者
Chen, Siqi [1 ]
Li, Xueming [2 ]
Zhang, Xianlin [2 ]
Wang, Mingdao [1 ]
Zhang, Yu [1 ]
Han, Jiatong [1 ]
Zhang, Yue [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 102206, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Arts, Beijing 102206, Peoples R China
关键词
Video colorization; Exemplar-based; Moving scenes; Long-term dependency; Spatiotemporal; IMAGE COLORIZATION; ATTENTION;
D O I
10.1016/j.knosys.2023.111240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exemplar-based video colorization is an essential technique for applications like old movie restoration. Although recent methods perform well in still scenes or scenes with regular movement, they always lack robustness in moving scenes due to their weak ability to model long-term dependency both spatially and temporally, leading to color fading, color discontinuity, or other artifacts. To solve this problem, we propose an exemplar-based video colorization framework with long-term spatiotemporal dependency. To enhance the long-term spatial dependency, a parallelized CNN-Transformer block and a double-head non-local operation are designed. The proposed CNN-Transformer block can better incorporate the long-term spatial dependency with local texture and structural features, and the double-head non-local operation further exploits the performance of the augmented feature. While for the long-term temporal dependency enhancement, we further introduce the novel Linkage subnet. The Linkage subnet propagates motion information across adjacent frame blocks and helps to maintain temporal continuity. Experiments demonstrate that our model outperforms recent state -of-the-art methods both quantitatively and qualitatively. Also, our model can generate more colorful, realistic and stabilized results, especially for scenes where objects change greatly and irregularly.
引用
收藏
页数:12
相关论文
共 70 条
  • [1] Abu-El-Haija S., 2016, PREPRINT, DOI DOI 10.48550/ARXIV.1609.08675
  • [2] Attention Augmented Convolutional Networks
    Bello, Irwan
    Zoph, Barret
    Vaswani, Ashish
    Shlens, Jonathon
    Le, Quoc V.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
  • [3] Ben-Zrihem N, 2015, PROC CVPR IEEE, P5233, DOI 10.1109/CVPR.2015.7299160
  • [4] Blind Video Temporal Consistency
    Bonneel, Nicolas
    Tompkin, James
    Sunkavalli, Kalyan
    Sun, Deqing
    Paris, Sylvain
    Pfister, Hanspeter
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
  • [5] Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN
    Chang, Ya-Liang
    Liu, Zhe Yu
    Lee, Kuan-Ying
    Hsu, Winston
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9065 - 9074
  • [6] Mobile-Former: Bridging MobileNet and Transformer
    Chen, Yinpeng
    Dai, Xiyang
    Chen, Dongdong
    Liu, Mengchen
    Dong, Xiaoyi
    Yuan, Lu
    Liu, Zicheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269
  • [7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] Measuring colourfulness in natural images
    Hasler, D
    Süsstrunk, S
    [J]. HUMAN VISION AND ELECTRONIC IMAGING VIII, 2003, 5007 : 87 - 95
  • [10] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778