SA-FPN: An effective feature pyramid network for crowded human detection

被引:101
作者
Zhou, Xinxin [1 ]
Zhang, Long [1 ]
机构
[1] Northeast Elect Power Univ, Sch Comp Sci, Jilin 132012, Jilin, Peoples R China
关键词
Object detection; Human detection; Crowd; Convolutional neural networks; Feature pyramid networks; PEDESTRIAN DETECTION;
D O I
10.1007/s10489-021-03121-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The crowded scenario not only contains instances at various scales but also introduces a variety of occlusion patterns ranging from non-occluded situations to heavily occluded cases, making the shapes of the instances different. All of those can result in human detectors being hard to apply to them. Feature pyramid networks (FPN), as an indispensable part of generic object detectors, can significantly boost detection performance involving objects at different scales. As a result, in this paper, we equip FPN with a multi-scale feature fusion technology and attention mechanisms to improve the performance of human detection in crowded scenarios. Firstly, we designed a feature pyramid structure with a refined hierarchical-split block, referred to as Scale-FPN, which can better handle the challenging problem of scale variation across object instances. Secondly, an attention-based lateral connection (ALC) module with spatial and channel attention mechanisms was proposed to replace the lateral connection in the FPN, which enhances the representational ability of feature maps through rich spatial and semantic information and lets detectors be capable of focusing on important features of occlusion patterns. Additionally, a bottom-up path augmentation (BPA) module was adopted to exploit the features of the Scale-FPN and ALC modules. To verify the effectiveness of the proposed method, we combined Scale-FPN, ALC and BPA, namely SA-FPN, and integrated it into the architecture of a crowded human detector. Experiments on the challenging CrowdHuman benchmark sufficiently validate the effectiveness of SA-FPN. Specifically, it improves the state-of-the-art result of CrowdDet from 41.4% to 39.9% MR-2, which indicates that the detector with SA-FPN brings in fewer false positives.
引用
收藏
页码:12556 / 12568
页数:13
相关论文
共 59 条
  • [1] [Anonymous], P IEEE C COMP VIS PA, DOI [DOI 10.1017/JPA.2016.141, DOI 10.1109/CVPR.2016.141]
  • [2] [Anonymous], 2016, ASIAN C COMP VIS
  • [3] Soft-NMS - Improving Object Detection With One Line of Code
    Bodla, Navaneeth
    Singh, Bharat
    Chellappa, Rama
    Davis, Larry S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5562 - 5570
  • [4] Computer vision and deep learning techniques for pedestrian detection and tracking: A survey
    Brunetti, Antonio
    Buongiorno, Domenico
    Trotta, Gianpaolo Francesco
    Bevilacqua, Vitoantonio
    [J]. NEUROCOMPUTING, 2018, 300 : 17 - 33
  • [5] Chi S, 2019, RELATIONAL LEARNING
  • [6] Detection in Crowded Scenes: One Proposal, Multiple Predictions
    Chu, Xuangeng
    Zheng, Anlin
    Zhang, Xiangyu
    Sun, Jian
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12211 - 12220
  • [7] Ding E., 2020, HS RESNET HIERARCHIC
  • [8] Dollár P, 2009, PROC CVPR IEEE, P304, DOI 10.1109/CVPRW.2009.5206631
  • [9] Human detection from images and videos: A survey
    Duc Thanh Nguyen
    Li, Wanqing
    Ogunbona, Philip O.
    [J]. PATTERN RECOGNITION, 2016, 51 : 148 - 175
  • [10] The PASCAL Visual Object Classes Challenge: A Retrospective
    Everingham, Mark
    Eslami, S. M. Ali
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136