Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting

被引:17
作者
Wang, Xin [1 ,2 ]
Lv, Rongrong [1 ]
Zhao, Yang [2 ]
Yang, Tangwen [1 ]
Ruan, Qiuqi [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Shenzhen Univ, Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Guangdong, Peoples R China
来源
PROCEEDINGS OF 2020 IEEE 15TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2020) | 2020年
关键词
dense context-aware module; hierarchical attention guided; multi-scale extraction; crowd counting;
D O I
10.1109/ICSP48669.2020.9321067
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Crowd counting aims to predict the number of people and generate the density map in the image. There are many challenges, including varying head scales, the diversity of crowd distribution across images and cluttered backgrounds. In this paper, we propose a multi-scale context aggregation network (MSCANet) based on single column encoder-decoder architecture for crowd counting, which consists of an encoder based on dense context-aware module (DCAM) and a hierarchical attention-guided decoder. To handle the issue of scale variation, we construct the DCAM to aggregate multi-scale contextual information through densely connecting the dilated convolution with varying receptive fields. The proposed DCAM can capture rich contextual information of crowd areas due to its long-range receptive fields and dense scale sampling. Moreover, to suppress the background noise and generate a high-quality density map, we adopt a hierarchical attention-guided mechanism in the decoder. This helps to integrate more useful spatial information from shallow feature maps of the encoder by introducing multiple supervision based on semantic attention module (SAM). Extensive experiments demonstrate that the proposed approach achieves better performance than other similar state-of-the-art methods on three challenging benchmark datasets for crowd counting.
引用
收藏
页码:240 / 245
页数:6
相关论文
共 21 条
  • [1] Scale Aggregation Network for Accurate and Efficient Crowd Counting
    Cao, Xinkun
    Wang, Zhipeng
    Zhao, Yanyun
    Su, Fei
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 757 - 773
  • [2] Crowd counting with crowd attention convolutional neural network
    Chen, Jiwei
    Su, Wen
    Wang, Zengfu
    [J]. NEUROCOMPUTING, 2020, 382 : 210 - 220
  • [3] Scale Pyramid Network for Crowd Counting
    Chen, Xinya
    Bin, Yanrui
    Sang, Nong
    Gao, Changxin
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1941 - 1950
  • [4] Gao G., 2020, ARXIV PREPRINT ARXIV
  • [5] Gao J, 2019, IEEE Transactions on Circuits and Systems for Video Technology, P1
  • [6] DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting
    Guo, Dan
    Li, Kun
    Zha, Zheng-Jun
    Wang, Meng
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1823 - 1832
  • [7] Jiang X., 2019, IEEE T NEUR NET LEAR
  • [8] Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks
    Jiang, Xiaolong
    Xiao, Zehao
    Zhang, Baochang
    Zhen, Xiantong
    Cao, Xianbin
    Doermann, David
    Shao, Ling
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6126 - 6135
  • [9] CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
    Li, Yuhong
    Zhang, Xiaofan
    Chen, Deming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1091 - 1100
  • [10] Feature Pyramid Networks for Object Detection
    Lin, Tsung-Yi
    Dollar, Piotr
    Girshick, Ross
    He, Kaiming
    Hariharan, Bharath
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944