MGRL: MUTUAL-GUIDANCE REPRESENTATION LEARNING FOR TEXT-TO-IMAGE PERSON RETRIEVAL

被引：0

作者：

Lv, Tianle ^{[1
]}

Li, Shuang ^{[1
]}

Leng, Jiaxu ^{[1
]}

Gao, Xinbo ^{[1
]}

机构：

[1] Chongqing Post & Commun Univ, Sch Comp Sci, Chongqing, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Person retrieval; text-image match; information interaction; mutual guidance;

D O I：

10.1109/ICASSP48485.2024.10447260

中图分类号：

学科分类号：

摘要：

Text-to-image person retrieval aims to recognize target pedestrians based on specified text. Existing methods mainly obtain image and text features separately through distinct feature extractors, subsequently embedding them into a unified feature space and calculating their similarity. Despite great success, current methods still suffer from the lack of information interaction between images and text. To address this issue, we propose Mutual-guidance Representation Learning (MGRL) for text-to-image person retrieval, which captures the key features for matching via text-image information interaction. Accordingly, our MGRL consists of two customized modules: iterative text-guided feature extraction (ITFE) and vision-assisted specific mask complement (VSMC). Specifically, ITFE is first designed to extract the matching information between the text and the image concerning the local feature attention of the target pedestrians by iterative text guidance. Then, to further ensure the image features extracted by ITFE contain the text description, VSMC is designed to utilize the extracted image features to help complete masked text where the mask is difficult to complete with only unmasked text information. Experiments are conducted on CUHK-PEDES and ICFG-PEDES datasets, and experimental results demonstrate the superiority of the proposed MGRL.

引用

页码：2895 / 2899

页数：5

共 4 条

[1] Learning discriminative region representation for person retrieval
Zhao, Yang
Yu, Xiaohan
Gao, Yongsheng
Shen, Chunhua
PATTERN RECOGNITION, 2022, 121
[2] DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
Zhu, Aichun
Wang, Zijie
Li, Yifeng
Wan, Xili
Jin, Jing
Wang, Tian
Hu, Fangqiang
Hua, Gang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 209 - 217
[3] Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval
Shen, Fei
Shu, Xiangbo
Du, Xiaoyu
Tang, Jinhui
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8922 - 8931
[4] Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold
Wang, Zijie
Zhu, Aichun
Xue, Jingyi
Wan, Xili
Liu, Chao
Wang, Tian
Li, Yifeng
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1984 - 1992

← 1 →