A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

被引:4
|
作者
Shang, Heng [1 ]
Zhao, Guoshuai [1 ]
Shi, Jing [1 ]
Qian, Xueming [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, SMILES Lab, Xian 710049, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Feature extraction; Semantics; Text mining; Intelligent systems; Image representation; Task analysis; Image edge detection;
D O I
10.1109/MIS.2023.3265176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [1] Reference-Aware Adaptive Network for Image-Text Matching
    Xiong, Guoxin
    Meng, Meng
    Zhang, Tianzhu
    Zhang, Dongming
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9678 - 9691
  • [2] A Mutually Textual and Visual Refinement Network for Image-Text Matching
    Pang, Shanmin
    Zeng, Yueyang
    Zhao, Jiawei
    Xue, Jianru
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7555 - 7566
  • [3] Image-text matching algorithm based on multi-level semantic alignment
    Li Y.
    Yao T.
    Zhang L.
    Sun Y.
    Fu H.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 551 - 558
  • [4] News Image-Text Matching With News Knowledge Graph
    Zhao Yumeng
    Yun Jing
    Gao Shuo
    Liu Limin
    IEEE ACCESS, 2021, 9 : 108017 - 108027
  • [5] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [6] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
    Dong, Xinfeng
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
  • [7] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Liu, An-An
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332
  • [8] Region Reinforcement Network With Topic Constraint for Image-Text Matching
    Wu, Jie
    Wu, Chunlei
    Lu, Jing
    Wang, Leiquan
    Cui, Xuerong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 388 - 397
  • [9] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297
  • [10] Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
    Ji, Zhong
    Lin, Zhigang
    Wang, Haoran
    He, Yuqing
    IEEE ACCESS, 2020, 8 : 38438 - 38447