Adversarial Learning with Mask Reconstruction for Text-Guided Image Inpainting

被引:4
作者
Wu, Xingcai [1 ]
Xie, Yucheng [1 ]
Zeng, Jiaqi [1 ]
Yang, Zhenguo [1 ]
Yu, Yi [2 ]
Li, Qing [3 ]
Liu, Wenyin [1 ,4 ]
机构
[1] Guangdong Univ Technol, Sch Comp Sci, Guangzhou, Peoples R China
[2] Natl Inst Informat, Digital Content & Media Sci Res Div, Tokyo, Japan
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[4] Peng Cheng Lab, Cyberspace Secur Res Ctr, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
中国国家自然科学基金;
关键词
Text-guided image inpainting; Object mask; Textual and visual; semantics;
D O I
10.1145/3474085.3475506
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided image inpainting aims to complete the corrupted patches coherent with both visual and textual context. On one hand, existing works focus on surrounding pixels of the corrupted patches without considering the objects in the image, resulting in the characteristics of objects described in text being painted on non-object regions. On the other hand, the redundant information in text may distract the generation of objects of interest in the restored image. In this paper, we propose an adversarial learning framework with mask reconstruction (ALMR) for image inpainting with textual guidance, which consists of a two-stage generator and dual discriminators. The twostage generator aims to restore coarse-grained and fine-grained images, respectively. In particular, we devise a dual-attention module (DAM) to incorporate the word-level and sentence-level textual features as guidance on generating the coarse-grained and finegrained details in the two stages. Furthermore, we design a mask reconstruction module (MRM) to penalize the restoration of the objects of interest with the given textual descriptions about the objects. For adversarial training, we exploit global and local discriminators for the whole image and corrupted patches, respectively. Extensive experiments conducted on CUB-200-2011, Oxford-102 and CelebA-HQ show the outperformance of the proposed ALMR (e.g., FID value is reduced from 29.69 to 14.69 compared with the state-of-the-art approach on CUB-200-2011). Codes are available at https://github.com/GaranWu/ALMR
引用
收藏
页码:3464 / 3472
页数:9
相关论文
共 28 条
[1]   PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing [J].
Barnes, Connelly ;
Shechtman, Eli ;
Finkelstein, Adam ;
Goldman, Dan B. .
ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03)
[2]  
CatherineWah Steve Branson, 2011, CALTECH UCSD BIRDS 2
[3]  
Chen HT, 2018, IEEE ANN INT CONF CY, P87, DOI 10.1109/CYBER.2018.8688041
[4]  
Efros AA, 2001, COMP GRAPH, P341, DOI 10.1145/383259.383296
[5]  
Goodfellow Ian J., 2014, INT C LEARNING REPRE
[6]   Globally and Locally Consistent Image Completion [J].
Iizuka, Satoshi ;
Simo-Serra, Edgar ;
Ishikawa, Hiroshi .
ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04)
[7]  
Karras T., 2017, ARXIV171010196
[8]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001
[9]  
Li C, 2016, IEEE INT SEMICONDUCT
[10]   Recurrent Feature Reasoning for Image Inpainting [J].
Li, Jingyuan ;
Wang, Ning ;
Zhang, Lefei ;
Du, Bo ;
Tao, Dacheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :7757-7765