An encoder-decoder network for crowd counting based on multi-scale attention mechanism

被引:0
作者
Chuang H.-H. [1 ]
Chen Y.-C. [1 ]
Lin C.H. [1 ]
机构
[1] Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, No. 43, Keelung Rd., Da’an Dist., Taipei
关键词
Attention mechanism; Crowd counting; Density estimation; Multi-scale attention; Skip-connection;
D O I
10.1007/s11042-024-19055-5
中图分类号
学科分类号
摘要
Crowd counting is a challenging computer vision task, which is widely used in video surveillance and public safety applications. With the increase of camera resolution and the complexity of crowd image, it becomes an important problem to predict crowd density and crowd count accurately. Recent CNN-based density estimation methods have shown effectiveness in densely populated scenes. In this paper, we present a novel approach to crowd counting through the development of an Encoder-Decoder Multi-Scale Attention Network. Our approach leverages the robust U-net architecture as the backbone network, strengthened by the strategic integration of an attention mechanism. We adopt a multi-scale attention method to each different layers in the U-net backbone to make the network extract features which focus on the crowds, instead of the background in the images. The attention mechanism and the skip-connections can adjust the weights of feature maps while maintaining features at different scales. Extensive experiments on ShanghaiTech Part_A & B and UCF-QNRF dataset demonstrate that our network can achieve better performances with Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values outperforming existing methodologies: ShanghaiTech Part_A (MAE/RMSE: 60.0/104.9), Part_B (MAE/RMSE: 7.8/13.8), and UCF-QNRF (MAE/RMSE: 98.6/179.7). © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:1187 / 1210
页数:23
相关论文
共 50 条
[21]   Lightweight multi-scale network with attention for accurate and efficient crowd counting [J].
Xi, Mengyuan ;
Yan, Hua .
VISUAL COMPUTER, 2024, 40 (06) :4553-4566
[22]   Multi-Scale and spatial position-based channel attention network for crowd counting [J].
Wang, Lin ;
Li, Jie ;
Zhang, Siqi ;
Qi, Chun ;
Wang, Pan ;
Wang, Fengping .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90
[23]   MLANet: multi-level attention network with multi-scale feature fusion for crowd counting [J].
Xiong, Liyan ;
Zeng, Yijuan ;
Huang, Xiaohui ;
Li, Zhida ;
Huang, Peng .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05) :6591-6608
[24]   Attention Based Encoder-decoder Network for Cardiac Semantic Segmentation [J].
Yuan, Xiaohan ;
Zhu, Yinsu ;
Wang, Yangang .
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, :4578-4582
[25]   MSNet: Multi-scale Network for Crowd Counting [J].
Shi, Ying ;
Sang, Jun ;
Alam, Mohammad S. ;
Liu, Xinyue ;
Tian, Shaoli .
PATTERN RECOGNITION AND TRACKING XXXII, 2021, 11735
[26]   Multi-scale supervised network for crowd counting [J].
Wang, Yongjie ;
Zhang, Wei ;
Huang, Dongxiao ;
Liu, Yanyan ;
Zhu, Jianghua .
IET IMAGE PROCESSING, 2020, 14 (17) :4701-4707
[27]   Crowd Counting Method Based on Multi-Scale Enhanced Network [J].
Xu Tao ;
Duan Yinong ;
Du Jiahao ;
Liu Caihua .
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (06) :1764-1771
[28]   Multi-scale Transformer-Based Crowd Counting Network [J].
Zou, Jie ;
Li, Yingying ;
Hu, Zijian ;
Wang, Yong .
2024 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, ICACTE, 2024, :241-245
[29]   Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting [J].
Wang, Xin ;
Lv, Rongrong ;
Zhao, Yang ;
Yang, Tangwen ;
Ruan, Qiuqi .
PROCEEDINGS OF 2020 IEEE 15TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2020), 2020, :240-245
[30]   MSGSA: Multi-Scale Guided Self-Attention Network for Crowd Counting [J].
Sun, Yange ;
Li, Meng ;
Guo, Huaping ;
Zhang, Li .
ELECTRONICS, 2023, 12 (12)