Image Harmonization Guided by Semantic Information

被引：0

作者：

Yang Z.-Y. ^{[1
,2
]}

Li P.-C. ^{[1
,2
]}

Liu F.-C. ^{[1
,2
]}

Gao C.-Q. ^{[1
,2
]}

机构：

[1] School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing

[2] Chongqing Key Laboratory of Signal and Information Processing, Chongqing

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2023年 / 51卷 / 07期

基金：

中国国家自然科学基金;

关键词：

image harmonization; image processing; local background information; multi-resolution selective fusion; semantic information; spatial feature information;

D O I：

10.12263/DZXB.20221322

中图分类号：

学科分类号：

摘要：

Image harmonization occupies an important position in image processing. It aims to adjust the foreground appearance, e.g., illumination, color, texture, etc., to be visually consistent with the background. However, existing deep learning-based methods usually use the feature distribution of the overall image background as a cue to adjust the foreground, without focusing on the critical role of semantic information for foreground alignment, resulting in local areas in the foreground to appear visually different from the background. To this end, based on the multi-resolution selective fusion module (MRSFM) and the lightweight convolutional block attention module (CBAM), this paper designs a multi-resolution selective fusion module based on dual attention mechanism (MRSF-DAM), which makes the final output feature map rich in semantic information, thus guiding the network to better understand the correlation between the foreground of an image and its surrounding scene, more enabling the network to fully obtain the various information needed to coordinate the foreground from the background, and eventually reducing the visual discrepancy between the foreground and background regions of an image. In addition, this article designs a new network architecture to selectively fuse the shallow and deep feature information. By multi-scale fusion and enhancement of the output feature maps of the first six network layers of the decoder and MRSF-DAM, the generated enhanced feature maps are fed into the final layer of the decoder, which can alleviate the problem introduced by skip connections of the unrelated features to the foreground,and besides, it reduces the loss of spatial feature information caused by multiple downsampling of the decoder, further improving the authenticity of the generated harmonized images. A large number of experiments were conducted on the widely used iHarmony4 benchmark dataset to verify the effectiveness of our method. Compared to the latest method SCS Co (Self Consistent Style Comparative learning for image harmonization), this proposed method improves the mean squared error (MSE), foreground mean squared error (fMSE) and peak signal to noise ratio (PSNR) of the entire dataset by 4.28, 61.97, and 1 dB, respectively. © 2023 Chinese Institute of Electronics. All rights reserved.

引用

页码：1826 / 1834

页数：8

共 38 条

[31]

CUN X D, PUN C M., Improving the harmony of the composite image by spatial-separated attention module, IEEE Transactions on Image Processing, 29, pp. 4759-4771, (2020)

[32]

HU J, SHEN L, SUN G., Squeeze-and-excitation networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, (2018)

[33]

HU J, SHEN L, ALBANIE S, Et al., Gather-excite: Exploiting feature context in convolutional neural networks, Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9423-9433, (2018)

[34]

WANG X L, GIRSHICK R, GUPTA A, Et al., Non-local neural networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7794-7803, (2018)

[35]

WOO S, PARK J, LEE J Y, Et al., CBAM: Convolutional block attention module, Computer Vision - ECCV 2018, pp. 3-19, (2018)

[36]

YANG Z X, ZHU L C, WU Y, Et al., Gated channel transformation for visual recognition, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11791-11800, (2020)

[37]

ZHANG R., Making convolutional networks shift-invariant again, Proceedings of International Conference on Machine Learning, pp. 7324-7334, (2019)

[38]

IOFFE S, SZEGEDY C., Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on International Conference on Machine Learning, 37, pp. 448-456, (2015)

← 1 2 3 4 →