Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引:1
|
作者
Li, Jiayi [1 ]
Jiang, Min [1 ]
Kong, Jun [2 ]
Tao, Xuefeng [2 ]
Luo, Xi [1 ]
机构
[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;
D O I
10.1109/TMM.2024.3410129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.
引用
收藏
页码:10678 / 10691
页数:14
相关论文
共 50 条
  • [41] Summarization of Text and Image Captioning in Information Retrieval Using Deep Learning Techniques
    Mahalakshmi, P.
    Fatima, N. Sabiyath
    IEEE ACCESS, 2022, 10 : 18289 - 18297
  • [42] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [43] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [44] Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval
    Li, Wenhui
    Yang, Song
    Li, Qiang
    Li, Xuanya
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1867 - 1880
  • [45] EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning
    Khang Nguyen
    Bui, Doanh C.
    Truc Trinh
    Vo, Nguyen D.
    IEEE ACCESS, 2022, 10 : 32443 - 32452
  • [46] The Research and Application in Intelligent Document Retrieval Based on Text Quantification and Subject Mapping
    Wang, Qin
    Qu, Shouning
    Du, Tao
    Zhang, Mingjing
    ADVANCED DESIGNS AND RESEARCHES FOR MANUFACTURING, PTS 1-3, 2013, 605-607 : 2561 - +
  • [47] Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords
    Hernandez-Castaneda, Angel
    Arnulfo Garcia-Hernandez, Rene
    Ledeneva, Yulia
    Eduardo Millan-Hernandez, Christian
    IEEE ACCESS, 2020, 8 : 49896 - 49907
  • [48] Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
    Dong, Jianfeng
    Wang, Yabing
    Chen, Xianke
    Qu, Xiaoye
    Li, Xirong
    He, Yuan
    Wang, Xun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5680 - 5694
  • [49] Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval
    Lan, Hong
    Zhang, Pufen
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 374 - 378
  • [50] Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
    Lin, Dixuan
    Peng, Yi-Xing
    Meng, Jingke
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6609 - 6620