Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引:1
|
作者
Li, Jiayi [1 ]
Jiang, Min [1 ]
Kong, Jun [2 ]
Tao, Xuefeng [2 ]
Luo, Xi [1 ]
机构
[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;
D O I
10.1109/TMM.2024.3410129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.
引用
收藏
页码:10678 / 10691
页数:14
相关论文
共 50 条
  • [1] Conditional Feature Learning Based Transformer for Text-Based Person Search
    Gao, Chenyang
    Cai, Guanyu
    Jiang, Xinyang
    Zheng, Feng
    Zhang, Jun
    Gong, Yifei
    Lin, Fangzhou
    Sun, Xing
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6097 - 6108
  • [2] DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval
    Li, Shenshen
    Xu, Xing
    Yang, Yang
    Shen, Fumin
    Mo, Yijun
    Li, Yujie
    Shen, Heng Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6292 - 6300
  • [3] Improving Text-Based Person Retrieval by Excavating All-Round Information Beyond Color
    Zhu, Aichun
    Wang, Zijie
    Xue, Jingyi
    Wan, Xili
    Jin, Jing
    Wang, Tian
    Snoussi, Hichem
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [4] Fine-grained Semantics-aware Representation Learning for Text-based Person Retrieval
    Wang, Di
    Yan, Feng
    Wang, Yifeng
    Zhao, Lin
    Liang, Xiao
    Zhong, Haodi
    Zhang, Ronghua
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 92 - 100
  • [5] Noise correspondence with evidence learning for text-based person search
    Xie, Yihan
    Zhang, Baohua
    Li, Yang
    Shan, Chongrui
    Wang, Shun
    Zhang, Jiale
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (05):
  • [6] Linguistic Hallucination for Text-Based Video Retrieval
    Fang, Sheng
    Dang, Tiantian
    Wang, Shuhui
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9692 - 9705
  • [7] EESSO: Exploiting Extreme and Smooth Signals via Omni-frequency learning for Text-based Person Retrieval
    Xue, Jingyi
    Wang, Zijie
    Dong, Guan-Nan
    Zhu, Aichun
    IMAGE AND VISION COMPUTING, 2024, 142
  • [8] An Overview of Text-based Person Search: Recent Advances and Future Directions
    Niu K.
    Liu Y.
    Long Y.
    Huang Y.
    Wang L.
    Zhang Y.
    IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (09) : 1 - 1
  • [9] Joint Token and Feature Alignment Framework for Text-Based Person Search
    Li, Shangze
    Lu, Andong
    Huang, Yan
    Li, Chenglong
    Wang, Liang
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2238 - 2242
  • [10] Cross-Modal Uncertainty Modeling With Diffusion-Based Refinement for Text-Based Person Retrieval
    Li, Shenshen
    Xu, Xing
    He, Chen
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2881 - 2893