MMC: Multi-modal colorization of images using textual description

被引：0

作者：

Ghosh, Subhankar ^{[1
]}

Bhattacharya, Saumik ^{[2
]}

Roy, Prasun ^{[1
]}

Pal, Umapada ^{[3
]}

Blumenstein, Michael ^{[1
]}

机构：

[1] Univ Technol Sydney, Fac Engn & IT, Ultimo, NSW, Australia

[2] Indian Inst Technol Kharagpur, E&ECE Dept, Kharagpur, India

[3] Indian Stat Inst, CVPR Unit, Kolkata, India

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 01期

关键词：

Colourisation; Generation; Text-information; Multi-modal colourisation;

D O I：

10.1007/s11760-024-03650-y

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Handling various objects with different colours is a significant challenge for image colourisation techniques. Thus, for complex real-world scenes, the existing image colourisation algorithms often fail to maintain colour consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the greyscale image that is to be colourised, to improve the fidelity of the colourisation process. To do so, we have proposed a deep network that takes two inputs (greyscale image and the respective encoded text description) and tries to predict the relevant colour components. Also, we have predicted each object in the image and have colourised them with their individual description to incorporate their specific attributes in the colourisation process. After that, a fusion model fuses all the image objects (segments) to generate the final colourised image. As the respective textual descriptions contain colour information of the objects in the image, text encoding helps improve the overall quality of predicted colours. In terms of performance, the proposed method outperforms existing colourisation techniques in terms of LPIPS, PSNR and SSIM metrics.

引用

页数：10

共 36 条

[1]

Antic J., 2019, A deep learning based project for colorizing and restoring old images (and video!)

[2] Coloring with Words: Guiding Image Colorization Through Text-Based Palette Generation [J].

Bahng, Hyojin ;

Yoo, Seungjoo ;

Cho, Wonwoong ;

Park, David Keetae ;

Wu, Ziming ;

Ma, Xiaojuan ;

Choo, Jaegul .

COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :443-459

[3]

Branson S., 2010, Technical Report CNS-TR-2010-001

[4] COCO-Stuff: Thing and Stuff Classes in Context [J].

Caesar, Holger ;

Uijlings, Jasper ;

Ferrari, Vittorio .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1209-1218

[5]

Carlucci F.M., 2018, IEEE Robotics and Automation Letters

[6] L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer [J].

Chang, Zheng ;

Weng, Shuchen ;

Li, Yu ;

Li, Si ;

Shi, Boxin .

COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 :360-375

[7] Deep Colorization [J].

Cheng, Zezhou ;

Yang, Qingxiong ;

Sheng, Bin .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :415-423

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9]

Devlin J, 2019, Arxiv, DOI arXiv:1810.04805

[10]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

← 1 2 3 4 →