An encoder-decoder network for crowd counting based on multi-scale attention mechanism

被引:0
作者
Chuang H.-H. [1 ]
Chen Y.-C. [1 ]
Lin C.H. [1 ]
机构
[1] Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, No. 43, Keelung Rd., Da’an Dist., Taipei
关键词
Attention mechanism; Crowd counting; Density estimation; Multi-scale attention; Skip-connection;
D O I
10.1007/s11042-024-19055-5
中图分类号
学科分类号
摘要
Crowd counting is a challenging computer vision task, which is widely used in video surveillance and public safety applications. With the increase of camera resolution and the complexity of crowd image, it becomes an important problem to predict crowd density and crowd count accurately. Recent CNN-based density estimation methods have shown effectiveness in densely populated scenes. In this paper, we present a novel approach to crowd counting through the development of an Encoder-Decoder Multi-Scale Attention Network. Our approach leverages the robust U-net architecture as the backbone network, strengthened by the strategic integration of an attention mechanism. We adopt a multi-scale attention method to each different layers in the U-net backbone to make the network extract features which focus on the crowds, instead of the background in the images. The attention mechanism and the skip-connections can adjust the weights of feature maps while maintaining features at different scales. Extensive experiments on ShanghaiTech Part_A & B and UCF-QNRF dataset demonstrate that our network can achieve better performances with Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values outperforming existing methodologies: ShanghaiTech Part_A (MAE/RMSE: 60.0/104.9), Part_B (MAE/RMSE: 7.8/13.8), and UCF-QNRF (MAE/RMSE: 98.6/179.7). © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:1187 / 1210
页数:23
相关论文
共 50 条
[31]   Symmetry Encoder-Decoder Network with Attention Mechanism for Fast Video Object Segmentation [J].
Guo, Mingyue ;
Zhang, Dejun ;
Sun, Jun ;
Wu, Yiqi .
SYMMETRY-BASEL, 2019, 11 (08)
[32]   MobileCount: An efficient encoder-decoder framework for real-time crowd counting [J].
Wang, Peng ;
Gao, Chenyu ;
Wang, Yang ;
Li, Hui ;
Gao, Ye .
NEUROCOMPUTING, 2020, 407 :292-299
[33]   An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting [J].
Zhang, Peng ;
Lei, Weimin ;
Zhao, Xinlei ;
Dong, Lijia ;
Lin, Zhaonan .
SENSORS, 2023, 23 (18)
[34]   STOCHASTIC MULTI-SCALE AGGREGATION NETWORK FOR CROWD COUNTING [J].
Wang, Mingjie ;
Cai, Hao ;
Zhou, Jun ;
Gong, Minglun .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :2008-2012
[35]   Land cover classification of synthetic aperture radar images based on encoder-decoder network with an attention mechanism [J].
Zheng, Nai-Rong ;
Yang, Zi-An ;
Shi, Xian-Zheng ;
Zhou, Ruo-Yi ;
Wang, Feng .
JOURNAL OF APPLIED REMOTE SENSING, 2022, 16 (01)
[36]   Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement [J].
Lan, Tian ;
Ye, Wenzheng ;
Lyu, Yilan ;
Zhang, Junyi ;
Liu, Qiao .
IEEE ACCESS, 2020, 8 :96677-96685
[37]   Attention-Based Encoder-Decoder Network for Prediction of Electromagnetic Scattering Fields [J].
Zhang, Ying ;
He, Mang .
2022 IEEE 10TH ASIA-PACIFIC CONFERENCE ON ANTENNAS AND PROPAGATION, APCAP, 2022,
[38]   Attention Aggregation Encoder-Decoder Network Framework for Stereo Matching [J].
Zhang, Yaru ;
Li, Yaqian ;
Kong, Yating ;
Liu, Bin .
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 :760-764
[39]   MULTI-STEP QUANTIZATION OF A MULTI-SCALE NETWORK FOR CROWD COUNTING [J].
Shim, Kyujin ;
Byun, Junyoung ;
Kim, Changick .
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, :683-687
[40]   Crowd counting in domain generalization based on multi-scale attention and hierarchy level enhancement [J].
Zhou, Jiarui ;
Zhang, Jianming ;
Gui, Yan .
SCIENTIFIC REPORTS, 2025, 15 (01)