Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引:1
|
作者
Li, Jiayi [1 ]
Jiang, Min [1 ]
Kong, Jun [2 ]
Tao, Xuefeng [2 ]
Luo, Xi [1 ]
机构
[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;
D O I
10.1109/TMM.2024.3410129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.
引用
收藏
页码:10678 / 10691
页数:14
相关论文
共 50 条
  • [31] Dual-Semantic Consistency Learning for Visible-Infrared Person Re-Identification
    Zhang, Yiyuan
    Kang, Yuhao
    Zhao, Sanyuan
    Shen, Jianbing
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1554 - 1565
  • [32] Multi-level Part-aware Feature Disentangling for Text-based Person Search
    Chen, Yuhao
    Zhang, Guoqing
    Zhang, Hongwei
    Zheng, Yuhui
    Lin, Weisi
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2801 - 2806
  • [33] A Generic Solver Combining Unsupervised Learning and Representation Learning for Breaking Text-Based Captchas
    Tian, Sheng
    Xiong, Tao
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 860 - 871
  • [34] Dual Stream Relation Learning Network for Image-Text Retrieval
    Wu, Dongqing
    Li, Huihui
    Gu, Cang
    Guo, Lei
    Liu, Hang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1551 - 1565
  • [35] Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback
    Xu, Yahui
    Bin, Yi
    Wei, Jiwei
    Yang, Yang
    Wang, Guoqing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9936 - 9948
  • [36] Deep learning for affective computing: Text-based emotion recognition in decision support
    Kratzwald, Bernhard
    Ilic, Suzana
    Kraus, Mathias
    Feuerriegel, Stefan
    Prendinger, Helmut
    DECISION SUPPORT SYSTEMS, 2018, 115 : 24 - 35
  • [37] PMG-Pyramidal Multi-Granular Matching for Text-Based Person Re-Identification
    Liu, Chao
    Xue, Jingyi
    Wang, Zijie
    Zhu, Aichun
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [38] Text2Face: Text-Based Face Generation With Geometry and Appearance Control
    Zhang, Zhaoyang
    Chen, Junliang
    Fu, Hongbo
    Zhao, Jianjun
    Chen, Shu-Yu
    Gao, Lin
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6481 - 6492
  • [39] Actor and Action Modular Network for Text-Based Video Segmentation
    Yang, Jianhua
    Huang, Yan
    Niu, Kai
    Huang, Linjiang
    Ma, Zhanyu
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4474 - 4489
  • [40] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297