MGRL: MUTUAL-GUIDANCE REPRESENTATION LEARNING FOR TEXT-TO-IMAGE PERSON RETRIEVAL

被引:0
|
作者
Lv, Tianle [1 ]
Li, Shuang [1 ]
Leng, Jiaxu [1 ]
Gao, Xinbo [1 ]
机构
[1] Chongqing Post & Commun Univ, Sch Comp Sci, Chongqing, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Person retrieval; text-image match; information interaction; mutual guidance;
D O I
10.1109/ICASSP48485.2024.10447260
中图分类号
学科分类号
摘要
Text-to-image person retrieval aims to recognize target pedestrians based on specified text. Existing methods mainly obtain image and text features separately through distinct feature extractors, subsequently embedding them into a unified feature space and calculating their similarity. Despite great success, current methods still suffer from the lack of information interaction between images and text. To address this issue, we propose Mutual-guidance Representation Learning (MGRL) for text-to-image person retrieval, which captures the key features for matching via text-image information interaction. Accordingly, our MGRL consists of two customized modules: iterative text-guided feature extraction (ITFE) and vision-assisted specific mask complement (VSMC). Specifically, ITFE is first designed to extract the matching information between the text and the image concerning the local feature attention of the target pedestrians by iterative text guidance. Then, to further ensure the image features extracted by ITFE contain the text description, VSMC is designed to utilize the extracted image features to help complete masked text where the mask is difficult to complete with only unmasked text information. Experiments are conducted on CUHK-PEDES and ICFG-PEDES datasets, and experimental results demonstrate the superiority of the proposed MGRL.
引用
收藏
页码:2895 / 2899
页数:5
相关论文
共 4 条
  • [1] Learning discriminative region representation for person retrieval
    Zhao, Yang
    Yu, Xiaohan
    Gao, Yongsheng
    Shen, Chunhua
    PATTERN RECOGNITION, 2022, 121
  • [2] DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
    Zhu, Aichun
    Wang, Zijie
    Li, Yifeng
    Wan, Xili
    Jin, Jing
    Wang, Tian
    Hu, Fangqiang
    Hua, Gang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 209 - 217
  • [3] Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval
    Shen, Fei
    Shu, Xiangbo
    Du, Xiaoyu
    Tang, Jinhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8922 - 8931
  • [4] Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold
    Wang, Zijie
    Zhu, Aichun
    Xue, Jingyi
    Wan, Xili
    Liu, Chao
    Wang, Tian
    Li, Yifeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1984 - 1992