MACA: Memory-aided Coarse-to-fine Alignment for Text-based Person Search

被引:2
作者
Su, Liangxu [1 ]
Quan, Rong [1 ]
Qi, Zhiyuan [1 ]
Qin, Jie [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Person Search; Cross-modality Retrieval; Person Re-identification;
D O I
10.1145/3626772.3657915
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-based person search (TBPS) aims to search for the target person in the full image through textual descriptions. The key to addressing this task is to effectively perform cross-modality alignment between text and images. In this paper, we propose a novel TBPS framework, named Memory-Aided Coarse-to-fine Alignment (MACA), to learn an accurate and reliable alignment between the two modalities. Firstly, we introduce a proposal-based alignment module, which performs contrastive learning to accurately align the textual modality with different pedestrian proposals at a coarse-grained level. Secondly, for the fine-grained alignment, we propose an attributebased alignment module to mitigate unreliable features by aligning text-wise details with image-wise global features. Moreover, we introduce an intuitive memory bank strategy to supplement useful negative samples for more effective contrastive learning, improving the convergence and generalization ability of the model based on the learned discriminative features. Extensive experiments on CUHK-SYSU-TBPS and PRW-TBPS demonstrate the superiority of MACA over state-of-the-art approaches. The code is available at https://github.com/suliangxu/MACA.
引用
收藏
页码:2497 / 2501
页数:5
相关论文
共 30 条
[1]  
[Anonymous], 2020, COMPUTER VISION ECCV, DOI DOI 10.1109/NEMS50311.2020.9265576
[2]  
[Anonymous], 2022, EUR C COMP VIS, DOI DOI 10.1007/978-3-031-19781-918
[3]  
Bird Steven, 2006, P ACL INTERACTIVE PO, P69, DOI [DOI 10.3115/1225403.1225421, 10.3115/1118108.1118117, DOI 10.3115/1118108.1118117]
[4]   PSTR: End-to-End One-Step Person Search With Transformers [J].
Cao, Jiale ;
Pang, Yanwei ;
Anwer, Rao Muhammad ;
Cholakkal, Hisham ;
Xie, Jin ;
Shah, Mubarak ;
Khan, Fahad Shahbaz .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9448-9457
[5]   Person Search via a Mask-Guided Two-Stream CNN Model [J].
Chen, Di ;
Zhang, Shanshan ;
Ouyang, Wanli ;
Yang, Jian ;
Tai, Ying .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :764-781
[6]   Norm-Aware Embedding for Efficient Person Search [J].
Chen, Di ;
Zhang, Shanshan ;
Yang, Jian ;
Schiele, Bernt .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12612-12621
[7]   TIPCB: A simple but effective part-based convolutional baseline for text-based person search [J].
Chen, Yuhao ;
Zhang, Guoqing ;
Lu, Yujiang ;
Wang, Zhenxing ;
Zheng, Yuhui .
NEUROCOMPUTING, 2022, 494 :171-181
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]  
Ding Zefeng, 2021, arXiv