共 44 条
Full-view salient feature mining and alignment for text-based person search
被引:4
|作者:
Xie, Sheng
[1
]
Zhang, Canlong
[1
,2
]
Ning, Enhao
[1
]
Li, Zhixin
[1
,2
]
Wang, Zhiwen
[3
]
Wei, Chunrong
[4
]
机构:
[1] Guangxi Normal Univ, Key Lab Educ Blockchain & Intelligent Technol, Minist Educ, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[3] Guangxi Univ Sci & Technol, Sch Comp Sci & Technol, Liuzhou 545006, Peoples R China
[4] Guangxi Normal Univ, Teachers Coll Vocat & Tech Educ, Guilin 541004, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Text-based person search;
Diffusion;
Full-view;
Generation;
Text attention;
OPTIMIZATION;
NETWORK;
D O I:
10.1016/j.eswa.2024.124071
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Text-based person search aims to retrieve relevant person images from a large database given textual queries. However, single-view limitation of surveillance cameras and cross-modal heterogeneity still remain challenging open issues. To address these, we propose a F ul l -view S a lient Feature Mining N etwork (FLAN) to improve text-image matching in this task. Our FLAN introduces two key innovations. First, the Diffusion-based Fullview Image Augmentation generates informative full-view data from a single image to simulate human visual observation and learn view-invariant features. Second, the Dual-max Text Attention module optimizes spatial and channel-wise text attentions to extract the most discriminative words characterizing the person. Together, these innovations handle insufficient, imbalanced, and heterogeneous data for more accurate matching. Extensive experiments on three text-based person search datasets, CUHK-PEDES, ICFG-PEDES and RSTPReid, demonstrate superior performance of our FLAN with improved robustness and generalization.
引用
收藏
页数:13
相关论文