Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification

被引：0

作者：

Xu, Xianri ^{[1
]}

Xu, Rongxian ^{[2
]}

机构：

[1] Fujian Business Univ, Coll Informat Engn, Fuzhou 350012, Peoples R China

[2] Huaqiao Univ, Coll Engn, Quanzhou 362021, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Transformers; Measurement; Identification of persons; Training; Pedestrians; Computer vision; Visualization; Encoding; Bonding; Transformer; meta learning; cross-model; metric learning; person retrieval; NETWORK;

D O I：

10.1109/ACCESS.2024.3511034

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Attribute-image person re-identification (AIPR) is a meaningful and challenging task to retrieve images based on attribute descriptions. In this paper, we propose a regularized dual modal meta metric learning ((RDML)-L-3) method for AIPR, which employs meta-learning training methods to enhance the transformer's capacity to acquire latent knowledge. During training, data are initially divided into a single-modal support set with images and a dual-modal query set containing both attributes and images. The (RDML)-L-3 method introduces an attribute-image transformer (AIT) as a novel feature extraction backbone, extending the visual transformer concept. Utilizing the concept of hard sample mining, the method designs attribute-image cross-modal meta metrics and image-image intra-modal meta metrics. The triple loss function based on meta-metrics is then applied to converge the same category samples and diverge different categories, thereby enhancing cross-modal and intramodal discrimination abilities. Finally, a regularization term is used to aggregate samples of different modalities in the query set to prevent overfitting, ensuring that (RDML)-L-3 maintains the model's generalization ability while aligning the two modalities and identifying unseen classes. Experimental results on the PETA and Market-1501 attribute datasets demonstrate the superiority of the (RDML)-L-3 method, achieving mean average precision (mAP) scores of 36.7% on the Market-1501 Attributes dataset and 60.6% on the PETA dataset.

引用

页码：183344 / 183353

页数：10

共 58 条

[1] Andrew G., 2013, PMLR, P1247
[2] pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
Chan, Eric R.
Monteiro, Marco
Kellnhofer, Petr
Wu, Jiajun
Wetzstein, Gordon
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5795 - 5805
[3] Deep Meta Metric Learning
Chen, Guangyi
Zhang, Tianren
Lu, Jiwen
Zhou, Jie
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9546 - 9555
[4] Probabilistic Embeddings for Cross-Modal Retrieval
Chun, Sanghyuk
Oh, Seong Joon
de Rezende, Rafael Sampaio
Kalantidis, Yannis
Larlus, Diane
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8411 - 8420
[5] Pedestrian Attribute Recognition At Far Distance
Deng, Yubin
Luo, Ping
Loy, Chen Change
Tang, Xiaoou
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 789 - 792
[6] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
Dong, Jianfeng
Long, Zhongzi
Mao, Xiaofeng
Lin, Changting
He, Yuan
Ji, Shouling
[J]. NEUROCOMPUTING, 2021, 440 : 207 - 219
[7] Person Search by Text Attribute Query as Zero-Shot Learning
Dong, Qi
Gong, Shaogang
Zhu, Xiatian
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3651 - 3660
[8] Dosovitskiy A., 2020, ARXIV
[9] Linking Image and Text with 2-Way Nets
Eisenschtat, Aviv
Wolf, Lior
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1855 - 1865
[10] Multi-modal Cycle-Consistent Generalized Zero-Shot Learning
Felix, Rafael
Kumar, B. G. Vijay
Reid, Ian
Carneiro, Gustavo
[J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 21 - 37

← 1 2 3 4 5 6 →