HAT: Hierarchical Aggregation Transformers for Person Re-identification

被引:93
作者
Zhang, Guowen [1 ]
Zhang, Pingping [1 ]
Qi, Jinqing [1 ]
Lu, Huchuan [1 ,2 ]
机构
[1] Dalian Univ Technol, Dalian, Peoples R China
[2] Pengcheng Lab, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
中国国家自然科学基金;
关键词
Person Re-identification; Transformers; Deep Feature Aggregation;
D O I
10.1145/3474085.3475202
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, with the advance of deep Convolutional Neural Networks (CNNs), person Re-Identification (Re-ID) has witnessed great success in various applications. However, with limited receptive fields of CNNs, it is still challenging to extract discriminative representations in a global view for persons under non-overlapped cameras. Meanwhile, Transformers demonstrate strong abilities of modeling long-range dependencies for spatial and sequential data. In this work, we take advantages of both CNNs and Transformers, and propose a novel learning framework named Hierarchical Aggregation Transformer (HAT) for image-based person Re-ID with high performance. To achieve this goal, we first propose a Deeply Supervised Aggregation (DSA) to recurrently aggregate hierarchical features from CNN backbones. With multi-granularity supervision, the DSA can enhance multi-scale features for person retrieval, which is very different from previous methods. Then, we introduce a Transformer-based Feature Calibration (TFC) to integrate low-level detail information as the global prior for high-level semantic information. The proposed TFC is inserted to each level of hierarchical features, resulting in great performance improvements. To our best knowledge, this work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID. Comprehensive experiments on four large-scale Re-ID benchmarks demonstrate that our method shows better results than several state-of-the-art methods. The code is released at https://github.com/AI-Zhpp/HAT.
引用
收藏
页码:516 / 525
页数:10
相关论文
共 62 条
[1]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[2]   Mixed High-Order Attention Network for Person Re-Identification [J].
Chen, Binghui ;
Deng, Weihong ;
Hu, Jiani .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :371-381
[3]   ABD-Net: Attentive but Diverse Person Re-Identification [J].
Chen, Tianlong ;
Ding, Shaojin ;
Xie, Jingyi ;
Yuan, Ye ;
Chen, Wuyang ;
Yang, Yang ;
Ren, Zhou ;
Wang, Zhangyang .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8350-8360
[4]  
Chen X., 2021, arXiv
[5]   Salience-Guided Cascaded Suppression Network for Person Re-identification [J].
Chen, Xuesong ;
Fu, Canmiao ;
Zhao, Yong ;
Zheng, Feng ;
Song, Jingkuan ;
Ji, Rongrong ;
Yang, Yi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3297-3307
[6]   Batch DropBlock Network for Person Re-identification and Beyond [J].
Dai, Zuozhuo ;
Chen, Mingqiang ;
Gu, Xiaodong ;
Zhu, Siyu ;
Tan, Ping .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3690-3700
[7]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8]   Bilinear Attention Networks for Person Retrieval [J].
Fang, Pengfei ;
Zhou, Jieming ;
Roy, Soumava Kumar ;
Petersson, Lars ;
Harandi, Mehrtash .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8029-8038
[9]  
Felzenszwalb P, 2008, PROC CVPR IEEE, P1984
[10]   Object Detection with Discriminatively Trained Part-Based Models [J].
Felzenszwalb, Pedro F. ;
Girshick, Ross B. ;
McAllester, David ;
Ramanan, Deva .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645