Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引：1

作者：

Li, Jiayi ^{[1
]}

Jiang, Min ^{[1
]}

Kong, Jun ^{[2
]}

Tao, Xuefeng ^{[2
]}

Luo, Xi ^{[1
]}

机构：

[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China

[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;

D O I：

10.1109/TMM.2024.3410129

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.

引用

页码：10678 / 10691

页数：14

共 50 条

[31] Dual-Semantic Consistency Learning for Visible-Infrared Person Re-Identification
Zhang, Yiyuan
Kang, Yuhao
Zhao, Sanyuan
Shen, Jianbing
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1554 - 1565
[32] Multi-level Part-aware Feature Disentangling for Text-based Person Search
Chen, Yuhao
Zhang, Guoqing
Zhang, Hongwei
Zheng, Yuhui
Lin, Weisi
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2801 - 2806
[33] A Generic Solver Combining Unsupervised Learning and Representation Learning for Breaking Text-Based Captchas
Tian, Sheng
Xiong, Tao
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 860 - 871
[34] Dual Stream Relation Learning Network for Image-Text Retrieval
Wu, Dongqing
Li, Huihui
Gu, Cang
Guo, Lei
Liu, Hang
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1551 - 1565
[35] Align and Retrieve: Composition and Decomposition Learning in Image Retrieval With Text Feedback
Xu, Yahui
Bin, Yi
Wei, Jiwei
Yang, Yang
Wang, Guoqing
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9936 - 9948
[36] Deep learning for affective computing: Text-based emotion recognition in decision support
Kratzwald, Bernhard
Ilic, Suzana
Kraus, Mathias
Feuerriegel, Stefan
Prendinger, Helmut
DECISION SUPPORT SYSTEMS, 2018, 115 : 24 - 35
[37] PMG-Pyramidal Multi-Granular Matching for Text-Based Person Re-Identification
Liu, Chao
Xue, Jingyi
Wang, Zijie
Zhu, Aichun
APPLIED SCIENCES-BASEL, 2023, 13 (21):
[38] Text2Face: Text-Based Face Generation With Geometry and Appearance Control
Zhang, Zhaoyang
Chen, Junliang
Fu, Hongbo
Zhao, Jianjun
Chen, Shu-Yu
Gao, Lin
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6481 - 6492
[39] Actor and Action Modular Network for Text-Based Video Segmentation
Yang, Jianhua
Huang, Yan
Niu, Kai
Huang, Linjiang
Ma, Zhanyu
Wang, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4474 - 4489
[40] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
Cheng, Qimin
Zhou, Yuzhuo
Fu, Peng
Xu, Yuan
Zhang, Liang
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297

← 1 2 3 4 5 →