Multilateral Semantic Relations Modeling for Image Text Retrieval

被引:14
|
作者
Wang, Zheng [1 ,3 ]
Gaol, Zhenwei [1 ]
Guol, Kangshuai [1 ]
Yang, Yang [1 ]
Wang, Xiaorning [1 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] UESTC Guangdong, Inst Elect & Informat Engn, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a fundamental task to bridge vision and language by exploiting various strategies to fine-grained alignment between regions and words. This is still tough mainly because of one-to-many correspondence, where a set of matches from another modality can be accessed by a random query. While existing solutions to this problem including multi-point mapping, probabilistic distribution, and geometric embedding have made promising progress, one-to-many correspondence is still under-explored. In this work, we develop a Multilateral Semantic Relations Modeling (termed MSRM) for image-text retrieval to capture the one-to-many correspondence between multiple samples and a given query via hypergraph modeling. Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance. Then each candidate instance in a mini-batch is regarded as a hypergraph node with its mean semantics while a Gaussian query is modeled as a hyperedge to capture the semantic correlations beyond the pair between candidate points and the query. Comprehensive experimental results on two widely used datasets demonstrate that our MSRM method can outperform state-of-the-art methods in the settlement of multiple matches while still maintaining the comparable performance of instance-level matching.
引用
收藏
页码:2830 / 2839
页数:10
相关论文
共 50 条
  • [31] A context-aware semantic modeling framework for efficient image retrieval
    Arun, K. S.
    Govindan, V. K.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (04) : 1259 - 1285
  • [32] Interactive Semantic Image Retrieval
    Patil, Pushpa B.
    Kokare, Manesh B.
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2013, 9 (03): : 349 - 364
  • [33] A semantic representation for image retrieval
    Wang, L
    Manjunath, BS
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 2, PROCEEDINGS, 2003, : 523 - 526
  • [34] Image Retrieval with Semantic Sketches
    Engel, David
    Herdtweck, Christian
    Browatzki, Bjoern
    Curio, Cristobal
    HUMAN-COMPUTER INTERACTION - INTERACT 2011, PT I, 2011, 6946 : 412 - 425
  • [35] Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
    Qin, Xue-Yang
    Li, Li-Shuang
    Tang, Jing-Yao
    Hao, Fei
    Ge, Mei-Ling
    Pang, Guang-Yao
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 811 - 826
  • [36] Remote sensing image-text retrieval based on layout semantic joint representation
    Zhang R.
    Nie J.
    Song N.
    Zheng C.
    Wei Z.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 671 - 683
  • [37] Semantic text-based image retrieval with multi-modality ontology and DBpedia
    Aspura, Yanti Idaya M. K.
    Noah, Shahrul Azman Mohd
    ELECTRONIC LIBRARY, 2017, 35 (06): : 1191 - 1214
  • [38] On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval
    Gong, Yan
    Cosma, Georgina
    Fang, Hui
    JOURNAL OF IMAGING, 2021, 7 (08)
  • [39] SAM: cross-modal semantic alignments module for image-text retrieval
    Park, Pilseo
    Jang, Soojin
    Cho, Yunsung
    Kim, Youngbin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12363 - 12377
  • [40] Multi-view and region reasoning semantic enhancement for image-text retrieval
    Cheng, Wengang
    Han, Ziyi
    He, Di
    Wu, Lifang
    MULTIMEDIA SYSTEMS, 2024, 30 (04)