Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引：1

作者：

Li, Jiayi ^{[1
]}

Jiang, Min ^{[1
]}

Kong, Jun ^{[2
]}

Tao, Xuefeng ^{[2
]}

Luo, Xi ^{[1
]}

机构：

[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China

[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;

D O I：

10.1109/TMM.2024.3410129

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.

引用

页码：10678 / 10691

页数：14

共 50 条

[21] Cross-Modal Feature Fusion-Based Knowledge Transfer for Text-Based Person Search
You, Kaiyang
Chen, Wenjing
Wang, Chengji
Sun, Hao
Xie, Wei
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2230 - 2234
[22] A Review of Text-Based Recommendation Systems
Kanwal, Safia
Nawaz, Sidra
Malik, Muhammad Kamran
Nawaz, Zubair
IEEE ACCESS, 2021, 9 : 31638 - 31661
[23] MINING FALSE POSITIVE EXAMPLES FOR TEXT-BASED PERSON RE-IDENTIFICATION
Xu, Wenhao
Shao, Zhiyin
Ding, Changxing
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1680 - 1684
[24] SEM-CS: SEMANTIC CLIPSTYLER FOR TEXT-BASED IMAGE STYLE TRANSFER
Kamra, Chanda Grover
Mastan, Indra Deep
Gupta, Debayan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 395 - 399
[25] Switching Text-Based Image Encoders for Captioning Images With Text
Ueda, Arisa
Yang, Wei
Sugiura, Komei
IEEE ACCESS, 2023, 11 : 55706 - 55715
[26] Part-Based Multi-Scale Attention Network for Text-Based Person Search
Wang, Yubin
Qi, Ding
Zhao, Cairong
PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 462 - 474
[27] Text-Based Person re-ID by Saliency Mask and Dynamic Label Smoothing
Pang, Yonghua
Zhang, Canlong
Li, Zhixin
Hu, Liaojie
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V, 2024, 14451 : 443 - 454
[28] Decentralized Text-Based Person Re-Identification in Multi-Camera Networks
Agyeman, Rockson
Rinner, Bernhard
IEEE ACCESS, 2024, 12 : 172125 - 172148
[29] Link-Driven Study to Enhance Text-Based Image Retrieval: Implicit Links Versus Explicit Links
Gasmi, Karim
Aouadi, Hatem
Torjmen, Mouna
IEEE ACCESS, 2023, 11 : 90526 - 90537
[30] Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval
Wang, Wei
Gao, Junyu
Yang, Xiaoshan
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2386 - 2397

← 1 2 3 4 5 →