From attributes to natural language: A survey and foresight on text-based person re-identification

被引:0
作者
Jiang, Fanzhi [1 ,2 ]
Yang, Su [1 ,2 ]
Jones, Mark W. [1 ]
Zhang, Liumei [3 ,4 ]
机构
[1] Swansea Univ, Sch Math & Comp Sci, Fabian Way, Swansea SA1 8EN, Wales
[2] Swansea Univ, Comp Vis & Machine Learning Lab, Fabian Way, Swansea SA1 8EN, Wales
[3] Xian Shiyou Univ, Sch Comp Sci, Dianzi 2nd Rd, Xian 710065, Shaanxi, Peoples R China
[4] Xian Shiyou Univ, Chengyin Lab, Dianzi 2nd Rd, Xian 710065, Shaanxi, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Person re-identification; Text; Natural language; Attributes; Diffusion model; ATTENTION NETWORK; TRANSFORMER;
D O I
10.1016/j.inffus.2024.102879
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodal analysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR).
引用
收藏
页数:23
相关论文
共 146 条
[1]  
Aggarwal S, 2020, IEEE WINT CONF APPL, P2606, DOI [10.1109/WACV45572.2020.9093640, 10.1109/wacv45572.2020.9093640]
[2]  
Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165, 10.48550/arXiv.2005.14165]
[3]  
Bai Y, 2023, Arxiv, DOI arXiv:2305.13653
[4]   A survey of approaches and trends in person re-identification [J].
Bedagkar-Gala, Apurva ;
Shah, Shishir K. .
IMAGE AND VISION COMPUTING, 2014, 32 (04) :270-286
[5]   Parallel Data Augmentation for Text-based Person Re-identification [J].
Cai, Han-Qing ;
Li, Xin ;
Ji, Yi ;
Li, Ying ;
Liu, Chun-Ping .
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[6]   Symbiotic Adversarial Learning for Attribute-Based Person Search [J].
Cao, Yu-Tong ;
Wang, Jingya ;
Tao, Dacheng .
COMPUTER VISION - ECCV 2020, PT XIV, 2020, 12359 :230-247
[7]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[8]   A Study on Deep Convolutional Neural Network Based Approaches for Person Re-identification [J].
Chahar, Harendra ;
Nain, Neeta .
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 :543-548
[9]  
Chechik G, 2009, LECT NOTES COMPUT SC, V5524, P11, DOI 10.1007/978-3-642-02172-5_2
[10]   Cross-Modal Retrieval with Heterogeneous Graph Embedding [J].
Chen, Dapeng ;
Wang, Min ;
Chen, Haobin ;
Wu, Lin ;
Qin, Jing ;
Peng, Wei .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :3291-3300