Multilateral Semantic Relations Modeling for Image Text Retrieval

被引:14
|
作者
Wang, Zheng [1 ,3 ]
Gaol, Zhenwei [1 ]
Guol, Kangshuai [1 ]
Yang, Yang [1 ]
Wang, Xiaorning [1 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] UESTC Guangdong, Inst Elect & Informat Engn, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval is a fundamental task to bridge vision and language by exploiting various strategies to fine-grained alignment between regions and words. This is still tough mainly because of one-to-many correspondence, where a set of matches from another modality can be accessed by a random query. While existing solutions to this problem including multi-point mapping, probabilistic distribution, and geometric embedding have made promising progress, one-to-many correspondence is still under-explored. In this work, we develop a Multilateral Semantic Relations Modeling (termed MSRM) for image-text retrieval to capture the one-to-many correspondence between multiple samples and a given query via hypergraph modeling. Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance. Then each candidate instance in a mini-batch is regarded as a hypergraph node with its mean semantics while a Gaussian query is modeled as a hyperedge to capture the semantic correlations beyond the pair between candidate points and the query. Comprehensive experimental results on two widely used datasets demonstrate that our MSRM method can outperform state-of-the-art methods in the settlement of multiple matches while still maintaining the comparable performance of instance-level matching.
引用
收藏
页码:2830 / 2839
页数:10
相关论文
共 50 条
  • [1] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [2] Weighted Semantic Fusion of Text and Content for Image Retrieval
    Goel, Nidhi
    Sehgal, Priti
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 681 - 687
  • [3] Characterization and classification of semantic image-text relations
    Otto, Christian
    Springstein, Matthias
    Anand, Avishek
    Ewerth, Ralph
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2020, 9 (01) : 31 - 45
  • [4] Characterization and classification of semantic image-text relations
    Christian Otto
    Matthias Springstein
    Avishek Anand
    Ralph Ewerth
    International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
  • [5] Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval
    Liu, Delong
    Li, Haiwen
    Zhao, Zhicheng
    Dong, Yuan
    NEURAL NETWORKS, 2025, 184
  • [6] Semantic Representation of Text Captions to Aid Sport Image Retrieval
    Kesorn, Kraisak
    Poslad, Stefan
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 387 - 390
  • [7] Understanding, Categorizing and Predicting Semantic Image-Text Relations
    Otto, Christian
    Springstein, Matthias
    Anand, Avishek
    Ewerth, Ralph
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 168 - 176
  • [8] Learning to Embed Semantic Similarity for Joint Image-Text Retrieval
    Malali, Noam
    Keller, Yosi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10252 - 10260
  • [9] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757
  • [10] A full-text framework for the image retrieval signal/semantic integration
    Belkhatir, M
    Mulhem, P
    Chiaramella, Y
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 113 - 123