Learning Semantic Polymorphic Mapping for Text-Based Person Retrieval

被引：1

作者：

Li, Jiayi ^{[1
]}

Jiang, Min ^{[1
]}

Kong, Jun ^{[2
]}

Tao, Xuefeng ^{[2
]}

Luo, Xi ^{[1
]}

机构：

[1] Jiangnan Univ, Engn Res Ctr Intelligent Technol Healthcare, Minist Educ, Wuxi 214122, Peoples R China

[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Semantics; Image reconstruction; Feature extraction; Task analysis; Cognition; Measurement; Representation learning; Text-based person retrieval; semantic polymorphism; implicit reasoning; modality alignment;

D O I：

10.1109/TMM.2024.3410129

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-Based Person Retrieval (TBPR) aims to identify a particular individual within an extensive image gallery using text as the query. The principal challenge inherent in the TBPR task revolves around how to map cross-modal information to a potential common space and learn a generic representation. Previous methods have primarily focused on aligning singular text-image pairs, disregarding the inherent polymorphism within both images and natural language expressions for the same individual. Moreover, these methods have also ignored the impact of semantic polymorphism-based intra-modal data distribution on cross-modal matching. Recent methods employ cross-modal implicit information reconstruction to enhance inter-modal connections. However, the process of information reconstruction remains ambiguous. To address these issues, we propose the Learning Semantic Polymorphic Mapping (LSPM) framework, facilitated by the prowess of pre-trained cross-modal models. Firstly, to learn cross-modal information representations with better robustness, we design the Inter-modal Information Aggregation (Inter-IA) module to achieve cross-modal polymorphic mapping, fortifying the foundation of our information representations. Secondly, to attain a more concentrated intra-modal information representation based on semantic polymorphism, we design Intra-modal Information Aggregation (Intra-IA) module to further constrain the embeddings. Thirdly, to further explore the potential of cross-modal interactions within the model, we design the implicit reasoning module, Masked Information Guided Reconstruction (MIGR), with constraint guidance to elevate overall performance. Extensive experiments on both CUHK-PEDES and ICFG-PEDES datasets show that we achieve state-of-the-art results on Rank-1, mAP and mINP compared to existing methods.

引用

页码：10678 / 10691

页数：14

共 50 条

[41] Summarization of Text and Image Captioning in Information Retrieval Using Deep Learning Techniques
Mahalakshmi, P.
Fatima, N. Sabiyath
IEEE ACCESS, 2022, 10 : 18289 - 18297
[42] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
Liu, Yang
Liu, Hong
Wang, Huaqiu
Liu, Mengyuan
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
[43] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
Wen, Keyu
Gu, Xiaodong
Cheng, Qingrong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
[44] Commonsense-Guided Semantic and Relational Consistencies for Image-Text Retrieval
Li, Wenhui
Yang, Song
Li, Qiang
Li, Xuanya
Liu, An-An
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1867 - 1880
[45] EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning
Khang Nguyen
Bui, Doanh C.
Truc Trinh
Vo, Nguyen D.
IEEE ACCESS, 2022, 10 : 32443 - 32452
[46] The Research and Application in Intelligent Document Retrieval Based on Text Quantification and Subject Mapping
Wang, Qin
Qu, Shouning
Du, Tao
Zhang, Mingjing
ADVANCED DESIGNS AND RESEARCHES FOR MANUFACTURING, PTS 1-3, 2013, 605-607 : 2561 - +
[47] Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords
Hernandez-Castaneda, Angel
Arnulfo Garcia-Hernandez, Rene
Ledeneva, Yulia
Eduardo Millan-Hernandez, Christian
IEEE ACCESS, 2020, 8 : 49896 - 49907
[48] Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
Dong, Jianfeng
Wang, Yabing
Chen, Xianke
Qu, Xiaoye
Li, Xirong
He, Yuan
Wang, Xun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5680 - 5694
[49] Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval
Lan, Hong
Zhang, Pufen
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 374 - 378
[50] Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
Lin, Dixuan
Peng, Yi-Xing
Meng, Jingke
Zheng, Wei-Shi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6609 - 6620

← 1 2 3 4 5 →