MSGSA: Multi-Scale Guided Self-Attention Network for Crowd Counting

被引:4
作者
Sun, Yange [1 ,2 ]
Li, Meng [1 ]
Guo, Huaping [1 ,2 ]
Zhang, Li [1 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Res Ctr Precis Sensing & Control, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
crowd counting; self-attention; convolutional neural networks; multi-scale feature;
D O I
10.3390/electronics12122631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of convolutional neural networks (CNN) for crowd counting has made significant progress in recent years; however, effectively addressing the scale variation and complex backgrounds remain challenging tasks. To address these challenges, we propose a novel Multi-Scale Guided Self-Attention (MSGSA) network that utilizes self-attention mechanisms to capture multi-scale contextual information for crowd counting. The MSGSA network consists of three key modules: a Feature Pyramid Module (FPM), a Scale Self-Attention Module (SSAM), and a Scale-aware Feature Fusion (SFA). By integrating self-attention mechanisms at multiple scales, our proposed method captures both global and local contextual information, leading to an improvement in the accuracy of crowd counting. We conducted extensive experiments on multiple benchmark datasets, and the results demonstrate that our method outperforms most existing methods in terms of counting accuracy and the quality of the generated density map. Our proposed MSGSA network provides a promising direction for efficient and accurate crowd counting in complex backgrounds.
引用
收藏
页数:14
相关论文
共 49 条
[1]   A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal [J].
Bai, Haoyue ;
Mao, Jiageng ;
Chan, S. -H. Gary .
NEUROCOMPUTING, 2022, 508 :1-18
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]   Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting [J].
Chen, Binghui ;
Yan, Zhaoyi ;
Li, Ke ;
Li, Pengyu ;
Wang, Biao ;
Zuo, Wangmeng ;
Zhang, Lei .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16045-16055
[4]   Cumulative Attribute Space for Age and Crowd Density Estimation [J].
Chen, Ke ;
Gong, Shaogang ;
Xiang, Tao ;
Loy, Chen Change .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2467-2474
[5]   Decoupled Two-Stage Crowd Counting and Beyond [J].
Cheng, Jian ;
Xiong, Haipeng ;
Cao, Zhiguo ;
Lu, Hao .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2862-2875
[6]  
Deng X., 2021, P 4 INT C ALGORITHMS, P1, DOI [10.1145/3508546.3508548, DOI 10.1145/3508546.3508548]
[7]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761
[8]  
Dosovitskiy A., 2020, ICLR 2021
[9]   A survey of crowd counting and density estimation based on convolutional neural network [J].
Fan, Zizhu ;
Zhang, Hong ;
Zhang, Zheng ;
Lu, Guangming ;
Zhang, Yudong ;
Wang, Yaowei .
NEUROCOMPUTING, 2022, 472 :224-251
[10]   Object Detection with Discriminatively Trained Part-Based Models [J].
Felzenszwalb, Pedro F. ;
Girshick, Ross B. ;
McAllester, David ;
Ramanan, Deva .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1627-1645