Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引:3
作者
Li, Jiayi [1 ]
Jiang, Min [1 ]
Kong, Jun [2 ]
Tao, Xuefeng [2 ]
Luo, Xi [1 ]
机构
[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;
D O I
10.1109/TMM.2024.3410129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.
引用
收藏
页码:10678 / 10691
页数:14
相关论文
共 64 条
[1]  
Bao HB, 2022, ADV NEUR IN
[2]   Improving Text-based Person Search by Spatial Matching and Adaptive Threshold [J].
Chen, Tianlang ;
Xu, Chenliang ;
Luo, Jiebo .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1879-1887
[3]   Cross-Modal Knowledge Adaptation for Language-Based Person Search [J].
Chen, Yucheng ;
Huang, Rui ;
Chang, Hong ;
Tan, Chuanqi ;
Xue, Tao ;
Ma, Bingpeng .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4057-4069
[4]   TIPCB: A simple but effective part-based convolutional baseline for text-based person search [J].
Chen, Yuhao ;
Zhang, Guoqing ;
Lu, Yujiang ;
Wang, Zhenxing ;
Zheng, Yuhui .
NEUROCOMPUTING, 2022, 494 :171-181
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Ding ZF, 2021, Arxiv, DOI arXiv:2107.12666
[7]  
Dosovitskiy A., 2021, 9 INT C LEARN REPR I
[8]  
Farooq A, 2022, AAAI CONF ARTIF INTE, P4477
[9]   Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification [J].
Feng, Zhanxiang ;
Lai, Jianhuang ;
Xie, Xiaohua .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :579-590
[10]   Conditional Feature Learning Based Transformer for Text-Based Person Search [J].
Gao, Chenyang ;
Cai, Guanyu ;
Jiang, Xinyang ;
Zheng, Feng ;
Zhang, Jun ;
Gong, Yifei ;
Lin, Fangzhou ;
Sun, Xing ;
Bai, Xiang .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :6097-6108