MSGSA: Multi-Scale Guided Self-Attention Network for Crowd Counting

被引:4
作者
Sun, Yange [1 ,2 ]
Li, Meng [1 ]
Guo, Huaping [1 ,2 ]
Zhang, Li [1 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Res Ctr Precis Sensing & Control, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
crowd counting; self-attention; convolutional neural networks; multi-scale feature;
D O I
10.3390/electronics12122631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of convolutional neural networks (CNN) for crowd counting has made significant progress in recent years; however, effectively addressing the scale variation and complex backgrounds remain challenging tasks. To address these challenges, we propose a novel Multi-Scale Guided Self-Attention (MSGSA) network that utilizes self-attention mechanisms to capture multi-scale contextual information for crowd counting. The MSGSA network consists of three key modules: a Feature Pyramid Module (FPM), a Scale Self-Attention Module (SSAM), and a Scale-aware Feature Fusion (SFA). By integrating self-attention mechanisms at multiple scales, our proposed method captures both global and local contextual information, leading to an improvement in the accuracy of crowd counting. We conducted extensive experiments on multiple benchmark datasets, and the results demonstrate that our method outperforms most existing methods in terms of counting accuracy and the quality of the generated density map. Our proposed MSGSA network provides a promising direction for efficient and accurate crowd counting in complex backgrounds.
引用
收藏
页数:14
相关论文
共 49 条
[21]   Boosting Crowd Counting via Multifaceted Attention [J].
Lin, Hui ;
Ma, Zhiheng ;
Ji, Rongrong ;
Wang, Yaowei ;
Hong, Xiaopeng .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19596-19605
[22]  
Liu H, 2021, PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, P860
[23]   Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting [J].
Liu, Lingbo ;
Chen, Jiaqi ;
Wu, Hefeng ;
Li, Guanbin ;
Li, Chenglong ;
Lin, Liang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4821-4831
[24]   Context-Aware Crowd Counting [J].
Liu, Weizhe ;
Salzmann, Mathieu ;
Fua, Pascal .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5094-5103
[25]   Crowd counting method via a dynamic-refined density map network [J].
Liu, Yanbo ;
Cao, Guo ;
Ge, Zixian ;
Hu, Yingxiang .
NEUROCOMPUTING, 2022, 497 :191-203
[26]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[27]  
Ma ZH, 2021, AAAI CONF ARTIF INTE, V35, P2319
[28]  
Miao YQ, 2020, AAAI CONF ARTIF INTE, V34, P11765
[29]  
Reddy MKK, 2020, IEEE WINT CONF APPL, P2803, DOI [10.1109/WACV45572.2020.9093409, 10.1109/wacv45572.2020.9093409]
[30]   Switching Convolutional Neural Network for Crowd Counting [J].
Sam, Deepak Babu ;
Surya, Shiv ;
Babu, R. Venkatesh .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4031-4039