Learning From Text: A Multimodal Face Inpainting Network for Irregular Holes

被引:1
作者
Zhan, Dandan [1 ]
Wu, Jiahao [1 ]
Luo, Xing [2 ]
Jin, Zhi [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China
[2] Peng Cheng Lab, Dept Math & Theories, Shenzhen 518055, Guangdong, Peoples R China
[3] Guangdong Prov Key Lab Fire Sci & Technol, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Faces; Face recognition; Feature extraction; Visualization; Transformers; Task analysis; Semantics; Face inpainting; irregular hole; multimodality; text description;
D O I
10.1109/TCSVT.2024.3370578
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Irregular hole face inpainting is a challenging task, since the appearance of faces varies greatly (e.g., different expressions and poses) and the human vision is more sensitive to subtle blemishes in the inpainted face images. Without external information, most existing methods struggle to generate new content containing semantic information for face components in the absence of sufficient contextual information. As it is known that text can be used to describe the content of an image in most cases, and is flexible and user-friendly. In this work, a concise and effective Multimodal Face Inpainting Network (MuFIN) is proposed, which simultaneously utilizes the information of the known regions and the descriptive text of the input image to address the problem of irregular hole face inpainting. To fully exploit the rest parts of the corrupted face images, a plug-and-play Multi-scale Multi-level Skip Fusion Module (MMSFM), which extracts multi-scale features and fuses shallow features into deep features at multiple levels, is illustrated. Moreover, to bridge the gap between textual and visual modalities and effectively fuse cross-modal features, a Multi-scale Text-Image Fusion Block (MTIFB), which incorporates text features into image features from both local and global scales, is developed. Extensive experiments conducted on two commonly used datasets CelebA and Multi-Modal-CelebA-HQ demonstrate that our method outperforms state-of-the-art methods both qualitatively and quantitatively, and can generate realistic and controllable results.
引用
收藏
页码:7484 / 7497
页数:14
相关论文
共 50 条
  • [21] Dual-Branch Meta-Learning Network With Distribution Alignment for Face Anti-Spoofing
    Jia, Yunpei
    Zhang, Jie
    Shan, Shiguang
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2022, 17 : 138 - 151
  • [22] Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals
    Jin, Lu
    Li, Zechao
    Tang, Jinhui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 1838 - 1851
  • [23] A Progressive Placeholder Learning Network for Multimodal Zero-Shot Learning
    Yang, Zhuopan
    Yang, Zhenguo
    Li, Xiaoping
    Yu, Yi
    Li, Qing
    Liu, Wenyin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7933 - 7945
  • [24] Polarized Image Translation From Nonpolarized Cameras for Multimodal Face Anti-Spoofing
    Tian, Yu
    Huang, Yalin
    Zhang, Kunbo
    Liu, Yue
    Sun, Zhenan
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 5651 - 5664
  • [25] Face inpainting network for large missing regions based on weighted facial similarity
    Jia Qin
    Huihui Bai
    Yao Zhao
    NEUROCOMPUTING, 2020, 386 : 54 - 62
  • [26] SFI-Swin: symmetric face inpainting with swin transformer by distinctly learning face components distributions
    MohammadHossein Givkashi
    MohammadReza Naderi
    Nader Karimi
    Shahram Shirani
    Shadrokh Samavi
    Multimedia Tools and Applications, 2025, 84 (17) : 17581 - 17595
  • [27] AE-GAN: Attention Embedded GAN for Irregular and Large-Area Mask Face Image Inpainting
    Bao, Yongtang
    Xiao, Xinfei
    Qi, Yue
    ADVANCES IN COMPUTER GRAPHICS, CGI 2022, 2022, 13443 : 330 - 341
  • [28] Age Factor Removal Network Based on Transfer Learning and Adversarial Learning for Cross-Age Face Recognition
    Du, Lingshuang
    Hu, Haifeng
    Wu, Yongbo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2830 - 2842
  • [29] GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation
    Lian, Zheng
    Chen, Lan
    Sun, Licai
    Liu, Bin
    Tao, Jianhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8419 - 8432
  • [30] Learning Dynamic Multimodal Network Slot Concepts from the Web for Forecasting Environmental, Social and Governance Ratings
    Ang, Gary
    Lim, Ee-Peng
    ACM TRANSACTIONS ON THE WEB, 2024, 18 (03)