Learning From Text: A Multimodal Face Inpainting Network for Irregular Holes

被引:1
|
作者
Zhan, Dandan [1 ]
Wu, Jiahao [1 ]
Luo, Xing [2 ]
Jin, Zhi [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China
[2] Peng Cheng Lab, Dept Math & Theories, Shenzhen 518055, Guangdong, Peoples R China
[3] Guangdong Prov Key Lab Fire Sci & Technol, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Faces; Face recognition; Feature extraction; Visualization; Transformers; Task analysis; Semantics; Face inpainting; irregular hole; multimodality; text description;
D O I
10.1109/TCSVT.2024.3370578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Irregular hole face inpainting is a challenging task, since the appearance of faces varies greatly (e.g., different expressions and poses) and the human vision is more sensitive to subtle blemishes in the inpainted face images. Without external information, most existing methods struggle to generate new content containing semantic information for face components in the absence of sufficient contextual information. As it is known that text can be used to describe the content of an image in most cases, and is flexible and user-friendly. In this work, a concise and effective Multimodal Face Inpainting Network (MuFIN) is proposed, which simultaneously utilizes the information of the known regions and the descriptive text of the input image to address the problem of irregular hole face inpainting. To fully exploit the rest parts of the corrupted face images, a plug-and-play Multi-scale Multi-level Skip Fusion Module (MMSFM), which extracts multi-scale features and fuses shallow features into deep features at multiple levels, is illustrated. Moreover, to bridge the gap between textual and visual modalities and effectively fuse cross-modal features, a Multi-scale Text-Image Fusion Block (MTIFB), which incorporates text features into image features from both local and global scales, is developed. Extensive experiments conducted on two commonly used datasets CelebA and Multi-Modal-CelebA-HQ demonstrate that our method outperforms state-of-the-art methods both qualitatively and quantitatively, and can generate realistic and controllable results.
引用
收藏
页码:7484 / 7497
页数:14
相关论文
共 50 条
  • [1] When Face Completion Meets Irregular Holes: An Attributes Guided Deep Inpainting Network
    Xiao, Jie
    Zhan, Dandan
    Qi, Haoran
    Jin, Zhi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3202 - 3210
  • [2] A Text-Enhanced Transformer Fusion Network for Multimodal Knowledge Graph Completion
    Wang, Jingchao
    Liu, Xiao
    Li, Weimin
    Liu, Fangfang
    Wu, Xing
    Jin, Qun
    IEEE INTELLIGENT SYSTEMS, 2024, 39 (03) : 54 - 62
  • [3] Dual Face Alignment Learning Network for NIR-VIS Face Recognition
    Hu, Weipeng
    Yan, Wenjun
    Hu, Haifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2411 - 2424
  • [4] MGN-Net: Multigranularity Graph Fusion Network in Multimodal for Scene Text Spotting
    Yuan, Zhengyi
    Shi, Cao
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (14): : 25088 - 25098
  • [5] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [6] Image-Text Multimodal Emotion Classification via Multi-View Attentional Network
    Yang, Xiaocui
    Feng, Shi
    Wang, Daling
    Zhang, Yifei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4014 - 4026
  • [7] Learning Social Relationship From Videos via Pre-Trained Multimodal Transformer
    Teng, Yiyang
    Song, Chenguang
    Wu, Bin
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1377 - 1381
  • [8] Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy
    Yu, Lingyun
    Xie, Hongtao
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2950 - 2962
  • [9] Perceptual face inpainting with multicolumn gated convolutional network
    Yang, You
    Li, Kesen
    Liu, Sixun
    Luo, Ling
    Xing, Bin
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (01)
  • [10] Learning Multi-Scale Knowledge-Guided Features for Text-Guided Face Recognition
    Hasan, Md Mahedi
    Sami, Shoaib Meraj
    Nasrabadi, Nasser M.
    Dawson, Jeremy
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2025, 7 (02): : 195 - 209