Image Harmonization Guided by Semantic Information

被引:0
作者
Yang Z.-Y. [1 ,2 ]
Li P.-C. [1 ,2 ]
Liu F.-C. [1 ,2 ]
Gao C.-Q. [1 ,2 ]
机构
[1] School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing
[2] Chongqing Key Laboratory of Signal and Information Processing, Chongqing
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2023年 / 51卷 / 07期
基金
中国国家自然科学基金;
关键词
image harmonization; image processing; local background information; multi-resolution selective fusion; semantic information; spatial feature information;
D O I
10.12263/DZXB.20221322
中图分类号
学科分类号
摘要
Image harmonization occupies an important position in image processing. It aims to adjust the foreground appearance, e.g., illumination, color, texture, etc., to be visually consistent with the background. However, existing deep learning-based methods usually use the feature distribution of the overall image background as a cue to adjust the foreground, without focusing on the critical role of semantic information for foreground alignment, resulting in local areas in the foreground to appear visually different from the background. To this end, based on the multi-resolution selective fusion module (MRSFM) and the lightweight convolutional block attention module (CBAM), this paper designs a multi-resolution selective fusion module based on dual attention mechanism (MRSF-DAM), which makes the final output feature map rich in semantic information, thus guiding the network to better understand the correlation between the foreground of an image and its surrounding scene, more enabling the network to fully obtain the various information needed to coordinate the foreground from the background, and eventually reducing the visual discrepancy between the foreground and background regions of an image. In addition, this article designs a new network architecture to selectively fuse the shallow and deep feature information. By multi-scale fusion and enhancement of the output feature maps of the first six network layers of the decoder and MRSF-DAM, the generated enhanced feature maps are fed into the final layer of the decoder, which can alleviate the problem introduced by skip connections of the unrelated features to the foreground,and besides, it reduces the loss of spatial feature information caused by multiple downsampling of the decoder, further improving the authenticity of the generated harmonized images. A large number of experiments were conducted on the widely used iHarmony4 benchmark dataset to verify the effectiveness of our method. Compared to the latest method SCS Co (Self Consistent Style Comparative learning for image harmonization), this proposed method improves the mean squared error (MSE), foreground mean squared error (fMSE) and peak signal to noise ratio (PSNR) of the entire dataset by 4.28, 61.97, and 1 dB, respectively. © 2023 Chinese Institute of Electronics. All rights reserved.
引用
收藏
页码:1826 / 1834
页数:8
相关论文
共 38 条
[1]  
CHU L T, LIU Y, WU Z W, Et al., Pp-humanseg: Connectivity-aware portrait segmentation with a large-scale teleconferencing video dataset, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), pp. 202-209, (2022)
[2]  
GAO Qi-fan, WU Xiao-lin, Real-time deep image retouching based on learnt semantics dependent global transforms, IEEE Transactions on Image Processing, 30, pp. 7378-7390, (2021)
[3]  
CHEN B C, KAE A., Toward realistic image compositing with adversarial learning, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8407-8416, (2019)
[4]  
XU Shao-ping, CHEN Xiao-guo, LI Fen, Et al., A low⁃light image enhancement algorithm using two⁃stage hybrid strategy, Acta Electronica Sinica, 49, 11, pp. 2166-2170, (2021)
[5]  
IIZUKA S, SIMO-SERRA E, ISHIKAWA H., Globally and locally consistent image completion, ACM Transactions on Graphics, 36, 4, pp. 1-14, (2017)
[6]  
ZHENG C, CHAM T J, CAI J, Et al., Bridging global context interactions for high-fidelity image completion, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11502-11512, (2022)
[7]  
BULAT A, YANG J, TZIMIROPOULOS G., To learn image super-resolution, use a GAN to learn how to do image degradation first, Computer Vision - ECCV 2018, pp. 187-202, (2018)
[8]  
WANG Xiang-hai, ZHAO Xiao-yang, WANG Xin-ying, Et al., Single image super-resolution reconstruction using deep residual networks with non-decimated wavelet edge learning, Acta Electronica Sinica, 50, 7, pp. 1753-1765, (2022)
[9]  
ZHOU Deng-wen, LI Wen-bin, LI Jin-xin, Et al., Image super-resolution reconstruction based on lightweight multi-scale channel attention network, Acta Electronica Sinica, 50, 10, pp. 2336-2346, (2022)
[10]  
LI Da-jin, GAO Wen-ran, GAO Jun-jie, Artistic video stylization based on kuwahara filter, Acta Electronica Sinica, 48, 3, pp. 538-544, (2020)