Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引:1
|
作者
Li, Jiayi [1 ]
Jiang, Min [1 ]
Kong, Jun [2 ]
Tao, Xuefeng [2 ]
Luo, Xi [1 ]
机构
[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;
D O I
10.1109/TMM.2024.3410129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.
引用
收藏
页码:10678 / 10691
页数:14
相关论文
共 50 条
  • [21] Cross-Modal Feature Fusion-Based Knowledge Transfer for Text-Based Person Search
    You, Kaiyang
    Chen, Wenjing
    Wang, Chengji
    Sun, Hao
    Xie, Wei
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2230 - 2234
  • [22] A Review of Text-Based Recommendation Systems
    Kanwal, Safia
    Nawaz, Sidra
    Malik, Muhammad Kamran
    Nawaz, Zubair
    IEEE ACCESS, 2021, 9 : 31638 - 31661
  • [23] MINING FALSE POSITIVE EXAMPLES FOR TEXT-BASED PERSON RE-IDENTIFICATION
    Xu, Wenhao
    Shao, Zhiyin
    Ding, Changxing
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1680 - 1684
  • [24] SEM-CS: SEMANTIC CLIPSTYLER FOR TEXT-BASED IMAGE STYLE TRANSFER
    Kamra, Chanda Grover
    Mastan, Indra Deep
    Gupta, Debayan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 395 - 399
  • [25] Switching Text-Based Image Encoders for Captioning Images With Text
    Ueda, Arisa
    Yang, Wei
    Sugiura, Komei
    IEEE ACCESS, 2023, 11 : 55706 - 55715
  • [26] Part-Based Multi-Scale Attention Network for Text-Based Person Search
    Wang, Yubin
    Qi, Ding
    Zhao, Cairong
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 462 - 474
  • [27] Text-Based Person re-ID by Saliency Mask and Dynamic Label Smoothing
    Pang, Yonghua
    Zhang, Canlong
    Li, Zhixin
    Hu, Liaojie
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V, 2024, 14451 : 443 - 454
  • [28] Decentralized Text-Based Person Re-Identification in Multi-Camera Networks
    Agyeman, Rockson
    Rinner, Bernhard
    IEEE ACCESS, 2024, 12 : 172125 - 172148
  • [29] Link-Driven Study to Enhance Text-Based Image Retrieval: Implicit Links Versus Explicit Links
    Gasmi, Karim
    Aouadi, Hatem
    Torjmen, Mouna
    IEEE ACCESS, 2023, 11 : 90526 - 90537
  • [30] Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval
    Wang, Wei
    Gao, Junyu
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2386 - 2397