A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

被引:4
|
作者
Shang, Heng [1 ]
Zhao, Guoshuai [1 ]
Shi, Jing [1 ]
Qian, Xueming [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, SMILES Lab, Xian 710049, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Feature extraction; Semantics; Text mining; Intelligent systems; Image representation; Task analysis; Image edge detection;
D O I
10.1109/MIS.2023.3265176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [41] Crossmodal Translation Based Meta Weight Adaption for Robust Image-Text Sentiment Analysis
    Zhang, Baozheng
    Yuan, Ziqi
    Xu, Hua
    Gao, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9949 - 9961
  • [42] Estimating the Semantics via Sector Embedding for Image-Text Retrieval
    Wang, Zheng
    Gao, Zhenwei
    Han, Mengqun
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10342 - 10353
  • [43] Multi-Layer Probabilistic Association Reasoning Network for Image-Text Retrieval
    Li, Wenrui
    Xiong, Ruiqin
    Fan, Xiaopeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9706 - 9717
  • [44] Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning
    Liu, Junhao
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3242 - 3253
  • [45] Multimodal Weibull Variational Autoencoder for Jointly Modeling Image-Text Data
    Wang, Chaojie
    Chen, Bo
    Xiao, Sucheng
    Wang, Zhengjue
    Zhang, Hao
    Wang, Penghui
    Han, Ning
    Zhou, Mingyuan
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 11156 - 11171
  • [46] Evaluating Generative AI Models for Image-Text Modification
    Soni, Jayesh
    Upadhyay, Himanshu
    Victor, Prince Patrick Anand
    Tripathi, Sarvapriya
    IEEE ACCESS, 2025, 13 : 40703 - 40729
  • [47] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
    Yu, Hongfeng
    Yao, Fanglong
    Lu, Wanxuan
    Liu, Nayu
    Li, Peiguang
    You, Hongjian
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
  • [48] Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval
    Li, Wenhui
    Yang, Song
    Li, Qiang
    Li, Xuanya
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1867 - 1880
  • [49] Towards Fast and Accurate Image-Text Retrieval With Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1361 - 1372
  • [50] Prior-Experience-Based Vision-Language Model for Remote Sensing Image-Text Retrieval
    Tang, Xu
    Huang, Dabiao
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62