A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

被引:4
|
作者
Shang, Heng [1 ]
Zhao, Guoshuai [1 ]
Shi, Jing [1 ]
Qian, Xueming [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, SMILES Lab, Xian 710049, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Feature extraction; Semantics; Text mining; Intelligent systems; Image representation; Task analysis; Image edge detection;
D O I
10.1109/MIS.2023.3265176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [21] Adversarial Attentive Multi-Modal Embedding Learning for Image-Text Matching
    Wei, Kaimin
    Zhou, Zhibo
    IEEE ACCESS, 2020, 8 (08): : 96237 - 96248
  • [22] Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching
    Li, Zheng
    Guo, Caili
    Wang, Xin
    Feng, Zerun
    Du, Zhongtian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1921 - 1935
  • [23] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [24] Image-Text Multimodal Emotion Classification via Multi-View Attentional Network
    Yang, Xiaocui
    Feng, Shi
    Wang, Daling
    Zhang, Yifei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4014 - 4026
  • [25] Multi-scale image-text matching network for scene and spatio-temporal images
    Yu, Runde
    Jin, Fusheng
    Qiao, Zhuang
    Yuan, Ye
    Wang, Guoren
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 142 : 292 - 300
  • [26] Fine-Grained Image-Text Retrieval via Discriminative Latent Space Learning
    Zheng, Min
    Wang, Wen
    Li, Qingyong
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 643 - 647
  • [27] Text-Guided Human Image Manipulation via Image-Text Shared Space
    Xu, Xiaogang
    Chen, Ying-Cong
    Tao, Xin
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6486 - 6500
  • [28] Hierarchical Knowledge-Based Graph Embedding Model for Image-Text Matching in IoTs
    Zhang, Lizong
    Li, Meng
    Yan, Ke
    Wang, Ruozhou
    Hui, Bei
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9399 - 9409
  • [29] Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-Text Search
    Wang, Song
    Zhao, Huan
    Li, Keqin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 8022 - 8036
  • [30] CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval
    Wen, Xin
    Han, Zhizhong
    Liu, Yu-Shen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2427 - 2437