TransReID: Transformer-based Object Re-Identification

被引：558

作者：

He, Shuting ^{[1
,2
,3
]}

Luo, Hao ^{[2
]}

Wang, Pichao ^{[2
]}

Wang, Fan ^{[2
]}

Li, Hao ^{[2
]}

Jiang, Wei ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Peoples R China

[2] Alibaba Grp, Hangzhou, Peoples R China

[3] Alibaba, Hangzhou, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

NETWORK;

D O I：

10.1109/ICCV48922.2021.01474

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extracting robust feature representation is one of the key challenges in object re-identification (ReID). Although convolution neural network (CNN)-based methods have achieved great success, they only process one local neighborhood at a time and suffer from information loss on details caused by convolution and downsampling operators (e.g. pooling and strided convolution). To overcome these limitations, we propose a pure transformer-based object ReID framework named TransReID. Specifically, we first encode an image as a sequence of patches and build a transformer-based strong baseline with a few critical improvements, which achieves competitive results on several ReID benchmarks with CNN-based methods. To further enhance the robust feature learning in the context of transformers, two novel modules are carefully designed. (i) The jigsaw patch module (JPM) is proposed to rearrange the patch embeddings via shift and patch shuffle operations which generates robust features with improved discrimination ability and more diversified coverage. (ii) The side information embeddings (SIE) is introduced to mitigate feature bias towards camera/view variations by plugging in learnable embeddings to incorporate these non-visual clues. To the best of our knowledge, this is the first work to adopt a pure transformer for ReID research. Experimental results of TransReID are superior promising, which achieve stateof-the-art performance on both person and vehicle ReID benchmarks. Code is available at https:// github. com/heshuting555/TransReID.

引用

页码：14993 / 15002

页数：10

共 61 条

[31] Stripe-based and attribute-aware network: a two-branch deep model for vehicle re-identification
Qian, Jingjing
Jiang, Wei
Luo, Hao
Yu, Hongyan
[J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2020, 31 (09)
[32] Performance Measures and a Data Set for Multi-target, Multi-camera Tracking
Ristani, Ergys
Solera, Francesco
Zou, Roger
Cucchiara, Rita
Tomasi, Carlo
[J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 17 - 35
[33] Selvaraju RR, 2020, INT J COMPUT VISION, V128, P336, DOI [10.1007/s11263-019-01228-7, 10.1109/ICCV.2017.74]
[34] Circle Loss: A Unified Perspective of Pair Similarity Optimization
Sun, Yifan
Cheng, Changmao
Zhang, Yuhan
Zhang, Chi
Zheng, Liang
Wang, Zhongdao
Wei, Yichen
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6397 - 6406
[35] Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)
Sun, Yifan
Zheng, Liang
Yang, Yi
Tian, Qi
Wang, Shengjin
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 501 - 518
[36] CFVMNet: A Multi-branch Network for Vehicle Re-identification Based on Common Field of View
Sun, Ziruo
Nie, Xiushan
Xi, Xiaoming
Yin, Yilong
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3523 - 3531
[37] Suprem A., 2020, ARXIV200202256
[38] Tang Y., 2020, ARXIV PREPRINT ARXIV
[39] Touvron H., 2020, ARXIV200308237
[40] Tsai-Shien Chen, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12347), P330, DOI 10.1007/978-3-030-58536-5_20

← 1 2 3 4 5 6 7 →