Learning From Text: A Multimodal Face Inpainting Network for Irregular Holes

被引:1
作者
Zhan, Dandan [1 ]
Wu, Jiahao [1 ]
Luo, Xing [2 ]
Jin, Zhi [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China
[2] Peng Cheng Lab, Dept Math & Theories, Shenzhen 518055, Guangdong, Peoples R China
[3] Guangdong Prov Key Lab Fire Sci & Technol, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Faces; Face recognition; Feature extraction; Visualization; Transformers; Task analysis; Semantics; Face inpainting; irregular hole; multimodality; text description;
D O I
10.1109/TCSVT.2024.3370578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Irregular hole face inpainting is a challenging task, since the appearance of faces varies greatly (e.g., different expressions and poses) and the human vision is more sensitive to subtle blemishes in the inpainted face images. Without external information, most existing methods struggle to generate new content containing semantic information for face components in the absence of sufficient contextual information. As it is known that text can be used to describe the content of an image in most cases, and is flexible and user-friendly. In this work, a concise and effective Multimodal Face Inpainting Network (MuFIN) is proposed, which simultaneously utilizes the information of the known regions and the descriptive text of the input image to address the problem of irregular hole face inpainting. To fully exploit the rest parts of the corrupted face images, a plug-and-play Multi-scale Multi-level Skip Fusion Module (MMSFM), which extracts multi-scale features and fuses shallow features into deep features at multiple levels, is illustrated. Moreover, to bridge the gap between textual and visual modalities and effectively fuse cross-modal features, a Multi-scale Text-Image Fusion Block (MTIFB), which incorporates text features into image features from both local and global scales, is developed. Extensive experiments conducted on two commonly used datasets CelebA and Multi-Modal-CelebA-HQ demonstrate that our method outperforms state-of-the-art methods both qualitatively and quantitatively, and can generate realistic and controllable results.
引用
收藏
页码:7484 / 7497
页数:14
相关论文
共 50 条
  • [31] Scale-Residual Learning Network for Scene Text Detection
    Cai, Yuanqiang
    Liu, Chang
    Cheng, Peirui
    Du, Dawei
    Zhang, Libo
    Wang, Weiqiang
    Ye, Qixiang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2725 - 2738
  • [32] Visible and Infrared Object Tracking via Convolution-Transformer Network With Joint Multimodal Feature Learning
    Qiu, Jiazhu
    Yao, Rui
    Zhou, Yong
    Wang, Peng
    Zhang, Yanning
    Zhu, Hancheng
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [33] DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation With Application to e-Learning
    Daza, Roberto
    Gomez, Luis F.
    Fierrez, Julian
    Morales, Aythami
    Tolosana, Ruben
    Ortega-Garcia, Javier
    [J]. IEEE ACCESS, 2024, 12 : 111343 - 111359
  • [34] Two-Stream Prototype Learning Network for Few-Shot Face Recognition Under Occlusions
    Yang, Xingyu
    Han, Mengya
    Luo, Yong
    Hu, Han
    Wen, Yonggang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1555 - 1563
  • [35] JDSR-GAN: Constructing an Efficient Joint Learning Network for Masked Face Super-Resolution
    Gao, Guangwei
    Tang, Lei
    Wu, Fei
    Lu, Huimin
    Yang, Jian
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1505 - 1512
  • [36] A Deep Attentive Multimodal Learning Approach for Disaster Identification From Social Media Posts
    Hossain, Eftekhar
    Hoque, Mohammed Moshiul
    Hoque, Enamul
    Islam, Md Saiful
    [J]. IEEE ACCESS, 2022, 10 : 46538 - 46551
  • [37] Generative Text Convolutional Neural Network for Hierarchical Document Representation Learning
    Wang, Chaojie
    Chen, Bo
    Duan, Zhibin
    Chen, Wenchao
    Zhang, Hao
    Zhou, Mingyuan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4586 - 4604
  • [38] 3D Face From X: Learning Face Shape From Diverse Sources
    Guo, Yudong
    Cai, Lin
    Zhang, Juyong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3815 - 3827
  • [39] Interpretable Local Frequency Binary Pattern (LFrBP) Based Joint Continual Learning Network for Heterogeneous Face Recognition
    Roy, Hiranmoy
    Bhattacharjee, Debotosh
    Krejcar, Ondrej
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2022, 17 : 2125 - 2136
  • [40] A Contrastive Learning Enhanced Adaptive Multimodal Fusion Network for Hyperspectral and LiDAR Data Classification
    Xu, Kai
    Wang, Bangjun
    Zhu, Zhou
    Jia, Zhaohong
    Fan, Chengcheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63