TransReID: Transformer-based Object Re-Identification

被引:558
作者
He, Shuting [1 ,2 ,3 ]
Luo, Hao [2 ]
Wang, Pichao [2 ]
Wang, Fan [2 ]
Li, Hao [2 ]
Jiang, Wei [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Alibaba, Hangzhou, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
中国国家自然科学基金;
关键词
NETWORK;
D O I
10.1109/ICCV48922.2021.01474
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting robust feature representation is one of the key challenges in object re-identification (ReID). Although convolution neural network (CNN)-based methods have achieved great success, they only process one local neighborhood at a time and suffer from information loss on details caused by convolution and downsampling operators (e.g. pooling and strided convolution). To overcome these limitations, we propose a pure transformer-based object ReID framework named TransReID. Specifically, we first encode an image as a sequence of patches and build a transformer-based strong baseline with a few critical improvements, which achieves competitive results on several ReID benchmarks with CNN-based methods. To further enhance the robust feature learning in the context of transformers, two novel modules are carefully designed. (i) The jigsaw patch module (JPM) is proposed to rearrange the patch embeddings via shift and patch shuffle operations which generates robust features with improved discrimination ability and more diversified coverage. (ii) The side information embeddings (SIE) is introduced to mitigate feature bias towards camera/view variations by plugging in learnable embeddings to incorporate these non-visual clues. To the best of our knowledge, this is the first work to adopt a pure transformer for ReID research. Experimental results of TransReID are superior promising, which achieve stateof-the-art performance on both person and vehicle ReID benchmarks. Code is available at https:// github. com/heshuting555/TransReID.
引用
收藏
页码:14993 / 15002
页数:10
相关论文
共 61 条
  • [1] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00412
  • [2] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00070
  • [4] Chen Hanting, 2021, CVPR
  • [5] Large-scale Tag-based Font Retrieval with Generative Feature Learning
    Chen, Tianlang
    Wang, Zhaowen
    Xu, Ning
    Jin, Hailin
    Luo, Jiebo
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9115 - 9124
  • [6] Chen Weihua, 2017, AAAI, V31
  • [7] Salience-Guided Cascaded Suppression Network for Person Re-identification
    Chen, Xuesong
    Fu, Canmiao
    Zhao, Yong
    Zheng, Feng
    Song, Jingkuan
    Ji, Rongrong
    Yang, Yi
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3297 - 3307
  • [8] Vehicle Re-identification with Viewpoint-aware Metric Learning
    Chu, Ruihang
    Sun, Yifan
    Li, Yadong
    Liu, Zheng
    Zhang, Chi
    Wei, Yichen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8281 - 8290
  • [9] Dosovitskiy A., 2020, ARXIV
  • [10] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778