SAT: Scale-Augmented Transformer for Person Search

被引:12
作者
Fiaz, Mustansar [1 ]
Cholakkal, Hisham [1 ]
Anwer, Rao Muhammad [1 ]
Khan, Fahad Shahbaz [1 ]
机构
[1] Mohamed bin Zayed Univ Artificial Intelligence, Dept Comp Vis, Abu Dhabi, U Arab Emirates
来源
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年
关键词
D O I
10.1109/WACV56688.2023.00480
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Person search is a challenging computer vision problem where the objective is to simultaneously detect and reidentify a target person from the gallery of whole scene images captured from multiple cameras. Here, the challenges related to underlying detection and re-identification tasks need to be addressed along with a joint optimization of these two tasks. In this paper, we propose a three-stage cascaded Scale-Augmented Transformer (SAT) person search framework. In the three-stage design of our SAT framework, the first stage performs person detection whereas the last two stages performs both detection and re-identification. Considering the contradictory nature of detection and reidentification, in the last two stages, we introduce separate norm feature embeddings for the two tasks to reconcile the relationship between them in a joint person search model. Our SAT framework benefits from the attributes of convolutional neural networks and transformers by introducing a convolutional encoder and a scale modulator within each stage. Here, the convolutional encoder increases the generalization ability of the model whereas the scale modulator performs context aggregation at different granularity levels to aid in handling pose/scale variations within a region of interest. To further improve the performance during occlusion, we apply shifting augmentation operations at each granularity level within the scale modulator. Experimental results on challenging CUHK-SYSU [35] and PRW [47] datasets demonstrate the favorable performance of our method compared to state-of-the-art methods. Our source code and trained models are available at this https URL.
引用
收藏
页码:4809 / 4818
页数:10
相关论文
共 48 条
  • [1] Cao J., 2021, IEEE Trans. Pattern Anal. Mach. Intell.
  • [2] PSTR: End-to-End One-Step Person Search With Transformers
    Cao, Jiale
    Pang, Yanwei
    Anwer, Rao Muhammad
    Cholakkal, Hisham
    Xie, Jin
    Shah, Mubarak
    Khan, Fahad Shahbaz
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9448 - 9457
  • [3] Carion Nicolas, 2020, ARXIV
  • [4] Chen D., 2018, P EUR C COMP VIS, P1
  • [5] Chen D, 2020, AAAI CONF ARTIF INTE, V34, P10518
  • [6] Norm-Aware Embedding for Efficient Person Search
    Chen, Di
    Zhang, Shanshan
    Yang, Jian
    Schiele, Bernt
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12612 - 12621
  • [7] Chen Shihui, 2021, ARXIV211114316
  • [8] Chu X., 2021, ARXIV210210882
  • [9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [10] Dong Wenkai, 2020, P IEEE C COMP VIS PA