Enhancing Text-Image Person Retrieval Through Nuances Varied Sample

被引：0

作者：

Xia, Jiaer ^{[1
]}

Yang, Haozhe ^{[1
]}

Zhang, Yan ^{[1
]}

Dai, Pingyang ^{[1
]}

机构：

[1] Xiamen Univ, Sch Informat, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Xiamen 361005, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I | 2024年 / 14425卷

基金：

欧洲研究理事会; 国家重点研发计划; 中国国家自然科学基金;

关键词：

Text-image person retrieval; Text-based person re-identification;

D O I：

10.1007/978-981-99-8429-9_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-image person retrieval is a task that involves searching for a specific individual based on a corresponding textual description. However, a key challenge in this task is achieving modal alignment while conducting fine-grained retrieval. Current methods utilize classification and metric losses to enhance discrimination and alignment. Nevertheless, the substantial dissimilarities between samples often impede the network's capacity to learn discriminative fine-grained information. To tackle this issue and enable the network to focus on intricate details, we introduce the Nuanced Variation Module (NVM). This module generates artificially difficult negative samples, which serve as a guide for directing the network's attention towards discerning nuances. The incorporation of NVM-constructed hard-negative samples enhances the alignment loss and facilitates the network's attentiveness to details. Additionally, we leverage the image text matching task to explicitly augment the network's fine-grained ability. By adopting our NVM method, the network can extract an ample amount of fine-grained features, thereby mitigating the interference caused by challenging negative samples. Extensive experiments demonstrate that our proposed method achieves competitive performance compared to state-of-the-art approaches on publicly available datasets.

引用

页码：185 / 196

页数：12

共 31 条

[1] TIPCB: A simple but effective part-based convolutional baseline for text-based person search [J].

Chen, Yuhao ;

Zhang, Guoqing ;

Lu, Yujiang ;

Wang, Zhenxing ;

Zheng, Yuhui .

NEUROCOMPUTING, 2022, 494 :171-181

[2]

Ding ZF, 2021, Arxiv, DOI arXiv:2107.12666

[3]

Farooq A, 2022, AAAI CONF ARTIF INTE, P4477

[4]

Frome A., 2013, Advances in neural information processing systems, V26, P2121

[5]

Gao CY, 2021, Arxiv, DOI arXiv:2101.03036

[6] Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models [J].

Gu, Jiuxiang ;

Cai, Jianfei ;

Joty, Shafiq ;

Niu, Li ;

Wang, Gang .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7181-7189

[7]

Han X, 2021, Arxiv, DOI arXiv:2110.10807

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9] Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval [J].

Jiang, Ding ;

Ye, Mang .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :2787-2797

[10]

Klein E, 2015, PROC CVPR IEEE, P4437, DOI 10.1109/CVPR.2015.7299073

← 1 2 3 4 →