Counting in congested crowd scenes with hierarchical scale-aware encoder-decoder network

被引:5
作者
Han, Run [1 ,2 ]
Qi, Ran [1 ]
Lu, Xuequan [3 ,4 ]
Huang, Lei
Lyu, Lei [1 ,2 ]
机构
[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Peoples R China
[2] Shandong Prov Key Lab Novel Distributed Comp Softw, Jinan, Peoples R China
[3] La Trobe Univ, Dept Comp Sci & IT, Melbourne, Vic, Australia
[4] Ocean Univ China, Coll Informat Sci & Engn, Qingdao, Peoples R China
基金
中国国家自然科学基金;
关键词
Crowd counting; Density estimation; Scale-aware network; Encoder-decoder network;
D O I
10.1016/j.eswa.2023.122087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an indispensable component of intelligent monitoring systems, crowd counting plays a crucial role in many fields, particularly crowd management and control during the COVID-19 pandemic. Despite the promising achievements of many methods, crowd scale variations and noise interference in congested crowd scenes remain urgent problems to be solved. In this paper, we propose a novel Hierarchical Scale-aware Encoder-Decoder Network (HSED-Net) for single-image crowd counting to handle scale variations and noise interference, thereby generating high-quality density maps. The HSED-Net is designed as an encoder-decoder architecture, which contains two core networks: Scale-Aware Encoding network (SAEnet) and Multi-path Aggregation Decoding network (MADnet). The SAEnet focuses on extracting rich multi-scale crowd features, which employs cascaded scale-aware encoding branches to collaboratively obtain high-resolution feature representations. During the encoding phase, two adaptive weight generators are proposed to filter the crowd features from different dimensions to resist the interference of noise. Instead of fusing the multi-scale and multi-level features indiscriminately, the MADnet adopts a multi-path adaptive fusion strategy and selectively emphasizes more appropriate features through the spatial and channel guidance modules, further improving the quality of density maps and the robustness of network. Extensive experiments on four challenging datasets have strongly demonstrated the superiority of our HSED-Net.
引用
收藏
页数:13
相关论文
共 69 条
  • [1] Learning Multilayer Channel Features for Pedestrian Detection
    Cao, Jiale
    Pang, Yanwei
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) : 3210 - 3220
  • [2] Scale Aggregation Network for Accurate and Efficient Crowd Counting
    Cao, Xinkun
    Wang, Zhipeng
    Zhao, Yanyun
    Su, Fei
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 757 - 773
  • [3] Counting Everyday Objects in Everyday ScenesCounting Everyday Objects in Everyday Scenes
    Chattopadhyay, Prithvijit
    Vedantam, Ramakrishna
    Selvaraju, Ramprasaath R.
    Batra, Dhruv
    Parikh, Devi
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4428 - 4437
  • [4] Region-aware network: Model human's Top-Down visual perception mechanism for crowd counting
    Chen, Yuehai
    Yang, Jing
    Zhang, Dong
    Zhang, Kun
    Chen, Badong
    Du, Shaoyi
    [J]. NEURAL NETWORKS, 2022, 148 : 219 - 231
  • [5] An Aggregated Multicolumn Dilated Convolution Network for Perspective-Free Counting
    Deb, Diptodip
    Ventura, Jonathan
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 308 - 317
  • [6] Crowd counting by using multi-level density-based spatial information: A Multi-scale CNN framework
    Dong, Li
    Zhang, Haijun
    Ji, Yuzhu
    Ding, Yuxin
    [J]. INFORMATION SCIENCES, 2020, 528 (528) : 79 - 91
  • [7] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [8] PCC Net: Perspective Crowd Counting via Spatial Convolutional Network
    Gao, Junyu
    Wang, Qi
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3486 - 3498
  • [9] SCAR: Spatial-/channel-wise attention regression networks for crowd counting
    Gao, Junyu
    Wang, Qi
    Yuan, Yuan
    [J]. NEUROCOMPUTING, 2019, 363 : 1 - 8
  • [10] Ge Weina, 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P2913, DOI 10.1109/CVPRW.2009.5206621