Crowd Counting Based on Multiscale Spatial Guided Perception Aggregation Network

被引:6
作者
Chen, Zhangping [1 ]
Zhang, Shuo [1 ]
Zheng, Xiaoqing [1 ]
Zhao, Xiaodong [1 ]
Kong, Yaguang [1 ]
机构
[1] Hangzhou Dianzi Univ, Dept Automat, Hangzhou 310018, Peoples R China
关键词
Crowd counting; feature fusion; regional loss; space guidance; NEURAL-NETWORK;
D O I
10.1109/TNNLS.2023.3304348
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting has received extensive attention in the field of computer vision, and methods based on deep convolutional neural networks (CNNs) have made great progress in this task. However, challenges such as scale variation, nonuniform distribution, complex background, and occlusion in crowded scenes hinder the performance of these networks in crowd counting. In order to overcome these challenges, this article proposes a multiscale spatial guidance perception aggregation network (MGANet) to achieve efficient and accurate crowd counting. MGANet consists of three parts: multiscale feature extraction network (MFEN), spatial guidance network (SGN), and attention fusion network (AFN). Specifically, to alleviate the scale variation problem in crowded scenes, MFEN is introduced to enhance the scale adaptability and effectively capture multiscale features in scenes with drastic scale variation. To address the challenges of nonuniform distribution and complex background in population, an SGN is proposed. The SGN includes two parts: the spatial context network (SCN) and the guidance perception network (GPN). SCN is used to capture the detailed semantic information between the multiscale feature positions extracted by MFEN, and improve the ability of deep structured information exploration. At the same time, the dependence relationship between the spatial remote context is established to enhance the receptive field. GPN is used to enhance the information exchange between channels and guide the network to select appropriate multiscale features and spatial context semantic features. AFN is used to adaptively measure the importance of the above different features, and obtain accurate and effective feature representations from them. In addition, this article proposes a novel region-adaptive loss function, which optimizes the regions with large recognition errors in the image, and alleviates the inconsistency between the training target and the evaluation metric. In order to evaluate the performance of the proposed method, extensive experiments were carried out on challenging benchmarks including ShanghaiTech Part A and Part B, UCF-CC-50, UCF-QNRF, and JHU-CROWD++. Experimental results show that the proposed method has good performance on all four datasets. Especially on ShanghaiTech Part A and Part B, CUCF-QNRF, and JHU-CROWD++ datasets, compared with the state-of-the-art methods, our proposed method achieves superior recognition performance and better robustness.
引用
收藏
页码:17465 / 17478
页数:14
相关论文
共 69 条
  • [31] SSD: Single Shot MultiBox Detector
    Liu, Wei
    Anguelov, Dragomir
    Erhan, Dumitru
    Szegedy, Christian
    Reed, Scott
    Fu, Cheng-Yang
    Berg, Alexander C.
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 21 - 37
  • [32] Context-Aware Crowd Counting
    Liu, Weizhe
    Salzmann, Mathieu
    Fua, Pascal
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5094 - 5103
  • [33] A survey of video-based crowd anomaly detection in dense scenes
    Ma J.
    Dai Y.
    Hirota K.
    [J]. 2017, Fuji Technology Press (21) : 235 - 246
  • [34] Bayesian Loss for Crowd Count Estimation with Point Supervision
    Ma, Zhiheng
    Wei, Xing
    Hong, Xiaopeng
    Gong, Yihong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6141 - 6150
  • [35] Image Segmentation Using Deep Learning: A Survey
    Minaee, Shervin
    Boykov, Yuri Y.
    Porikli, Fatih
    Plaza, Antonio J.
    Kehtarnavaz, Nasser
    Terzopoulos, Demetri
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3523 - 3542
  • [36] U-Net: Convolutional Networks for Biomedical Image Segmentation
    Ronneberger, Olaf
    Fischer, Philipp
    Brox, Thomas
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 : 234 - 241
  • [37] Switching Convolutional Neural Network for Crowd Counting
    Sam, Deepak Babu
    Surya, Shiv
    Babu, R. Venkatesh
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4031 - 4039
  • [38] Crowd Counting via Adversarial Cross-Scale Consistency Pursuit
    Shen, Zan
    Xu, Yi
    Ni, Bingbing
    Wang, Minsi
    Hu, Jianguo
    Yang, Xiaokang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5245 - 5254
  • [39] JHU-CROWD plus plus : Large-Scale Crowd Counting Dataset and A Benchmark Method
    Sindagi, Vishwanath A.
    Yasarla, Rajeev
    Patel, Vishal M.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2594 - 2609
  • [40] Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting
    Sindagi, Vishwanath A.
    Patel, Vishal M.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1002 - 1012