Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification

被引:0
作者
Xu, Xianri [1 ]
Xu, Rongxian [2 ]
机构
[1] Fujian Business Univ, Coll Informat Engn, Fuzhou 350012, Peoples R China
[2] Huaqiao Univ, Coll Engn, Quanzhou 362021, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Transformers; Measurement; Identification of persons; Training; Pedestrians; Computer vision; Visualization; Encoding; Bonding; Transformer; meta learning; cross-model; metric learning; person retrieval; NETWORK;
D O I
10.1109/ACCESS.2024.3511034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Attribute-image person re-identification (AIPR) is a meaningful and challenging task to retrieve images based on attribute descriptions. In this paper, we propose a regularized dual modal meta metric learning ((RDML)-L-3) method for AIPR, which employs meta-learning training methods to enhance the transformer's capacity to acquire latent knowledge. During training, data are initially divided into a single-modal support set with images and a dual-modal query set containing both attributes and images. The (RDML)-L-3 method introduces an attribute-image transformer (AIT) as a novel feature extraction backbone, extending the visual transformer concept. Utilizing the concept of hard sample mining, the method designs attribute-image cross-modal meta metrics and image-image intra-modal meta metrics. The triple loss function based on meta-metrics is then applied to converge the same category samples and diverge different categories, thereby enhancing cross-modal and intramodal discrimination abilities. Finally, a regularization term is used to aggregate samples of different modalities in the query set to prevent overfitting, ensuring that (RDML)-L-3 maintains the model's generalization ability while aligning the two modalities and identifying unseen classes. Experimental results on the PETA and Market-1501 attribute datasets demonstrate the superiority of the (RDML)-L-3 method, achieving mean average precision (mAP) scores of 36.7% on the Market-1501 Attributes dataset and 60.6% on the PETA dataset.
引用
收藏
页码:183344 / 183353
页数:10
相关论文
共 58 条
  • [1] Andrew G., 2013, PMLR, P1247
  • [2] pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
    Chan, Eric R.
    Monteiro, Marco
    Kellnhofer, Petr
    Wu, Jiajun
    Wetzstein, Gordon
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5795 - 5805
  • [3] Deep Meta Metric Learning
    Chen, Guangyi
    Zhang, Tianren
    Lu, Jiwen
    Zhou, Jie
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9546 - 9555
  • [4] Probabilistic Embeddings for Cross-Modal Retrieval
    Chun, Sanghyuk
    Oh, Seong Joon
    de Rezende, Rafael Sampaio
    Kalantidis, Yannis
    Larlus, Diane
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8411 - 8420
  • [5] Pedestrian Attribute Recognition At Far Distance
    Deng, Yubin
    Luo, Ping
    Loy, Chen Change
    Tang, Xiaoou
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 789 - 792
  • [6] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
    Dong, Jianfeng
    Long, Zhongzi
    Mao, Xiaofeng
    Lin, Changting
    He, Yuan
    Ji, Shouling
    [J]. NEUROCOMPUTING, 2021, 440 : 207 - 219
  • [7] Person Search by Text Attribute Query as Zero-Shot Learning
    Dong, Qi
    Gong, Shaogang
    Zhu, Xiatian
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3651 - 3660
  • [8] Dosovitskiy A., 2020, ARXIV
  • [9] Linking Image and Text with 2-Way Nets
    Eisenschtat, Aviv
    Wolf, Lior
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1855 - 1865
  • [10] Multi-modal Cycle-Consistent Generalized Zero-Shot Learning
    Felix, Rafael
    Kumar, B. G. Vijay
    Reid, Ian
    Carneiro, Gustavo
    [J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 21 - 37