MACA: Memory-aided Coarse-to-fine Alignment for Text-based Person Search

被引:2
作者
Su, Liangxu [1 ]
Quan, Rong [1 ]
Qi, Zhiyuan [1 ]
Qin, Jie [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Person Search; Cross-modality Retrieval; Person Re-identification;
D O I
10.1145/3626772.3657915
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-based person search (TBPS) aims to search for the target person in the full image through textual descriptions. The key to addressing this task is to effectively perform cross-modality alignment between text and images. In this paper, we propose a novel TBPS framework, named Memory-Aided Coarse-to-fine Alignment (MACA), to learn an accurate and reliable alignment between the two modalities. Firstly, we introduce a proposal-based alignment module, which performs contrastive learning to accurately align the textual modality with different pedestrian proposals at a coarse-grained level. Secondly, for the fine-grained alignment, we propose an attributebased alignment module to mitigate unreliable features by aligning text-wise details with image-wise global features. Moreover, we introduce an intuitive memory bank strategy to supplement useful negative samples for more effective contrastive learning, improving the convergence and generalization ability of the model based on the learned discriminative features. Extensive experiments on CUHK-SYSU-TBPS and PRW-TBPS demonstrate the superiority of MACA over state-of-the-art approaches. The code is available at https://github.com/suliangxu/MACA.
引用
收藏
页码:2497 / 2501
页数:5
相关论文
共 30 条
[11]  
Farooq A, 2022, AAAI CONF ARTIF INTE, P4477
[12]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610
[13]   Re-ID Driven Localization Refinement for Person Search [J].
Han, Chuchu ;
Ye, Jiacheng ;
Zhong, Yunshan ;
Tan, Xin ;
Zhang, Chi ;
Gao, Changxin ;
Sang, Nong .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9813-9822
[14]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[15]  
Jing Y, 2020, AAAI CONF ARTIF INTE, V34, P11189
[16]   Person Search by Multi-Scale Matching [J].
Lan, Xu ;
Zhu, Xiatian ;
Gong, Shaogang .
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 :553-569
[17]   Person Search with Natural Language Description [J].
Li, Shuang ;
Xiao, Tong ;
Li, Hongsheng ;
Zhou, Bolei ;
Yue, Dayu ;
Wang, Xiaogang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5187-5196
[18]   Feature Pyramid Networks for Object Detection [J].
Lin, Tsung-Yi ;
Dollar, Piotr ;
Girshick, Ross ;
He, Kaiming ;
Hariharan, Bharath ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944
[19]  
Qin Jie, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10096010
[20]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149