From attributes to natural language: A survey and foresight on text-based person re-identification

被引:0
作者
Jiang, Fanzhi [1 ,2 ]
Yang, Su [1 ,2 ]
Jones, Mark W. [1 ]
Zhang, Liumei [3 ,4 ]
机构
[1] Swansea Univ, Sch Math & Comp Sci, Fabian Way, Swansea SA1 8EN, Wales
[2] Swansea Univ, Comp Vis & Machine Learning Lab, Fabian Way, Swansea SA1 8EN, Wales
[3] Xian Shiyou Univ, Sch Comp Sci, Dianzi 2nd Rd, Xian 710065, Shaanxi, Peoples R China
[4] Xian Shiyou Univ, Chengyin Lab, Dianzi 2nd Rd, Xian 710065, Shaanxi, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Person re-identification; Text; Natural language; Attributes; Diffusion model; ATTENTION NETWORK; TRANSFORMER;
D O I
10.1016/j.inffus.2024.102879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodal analysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR).
引用
收藏
页数:23
相关论文
共 146 条
[31]   Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments [J].
Gao, Liying ;
Niu, Kai ;
Jiao, Bingliang ;
Wang, Peng ;
Zhang, Yanning .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) :7884-7899
[32]  
Ge Jing., 2019, arXiv
[33]   Cross-modal semantic aligning and neighbor-aware completing for robust text-image person retrieval [J].
Gong, Tiantian ;
Wang, Junsheng ;
Zhang, Liyan .
INFORMATION FUSION, 2024, 112
[34]  
Gray D., 2007, IEEE INT WORKSHOP PE, V3, P5
[35]  
Han X, 2021, Arxiv, DOI arXiv:2110.10807
[36]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[37]  
Hermans A, 2017, Arxiv, DOI arXiv:1703.07737
[38]  
Hertz A., 2022, arXiv
[39]  
Hirzer M, 2011, LECT NOTES COMPUT SC, V6688, P91, DOI 10.1007/978-3-642-21227-7_9
[40]   Deep learning for visible-infrared cross-modality person re-identification: A comprehensive review [J].
Huang, Nianchang ;
Liu, Jianan ;
Miao, Yunqi ;
Zhang, Qiang ;
Han, Jungong .
INFORMATION FUSION, 2023, 91 :396-411