An encoder-decoder network for crowd counting based on multi-scale attention mechanism

被引：0

作者：

Chuang H.-H. ^{[1
]}

Chen Y.-C. ^{[1
]}

Lin C.H. ^{[1
]}

机构：

[1] Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, No. 43, Keelung Rd., Da’an Dist., Taipei

来源：

Multimedia Tools and Applications | 2025年 / 84卷 / 03期

关键词：

Attention mechanism; Crowd counting; Density estimation; Multi-scale attention; Skip-connection;

D O I：

10.1007/s11042-024-19055-5

中图分类号：

学科分类号：

摘要：

Crowd counting is a challenging computer vision task, which is widely used in video surveillance and public safety applications. With the increase of camera resolution and the complexity of crowd image, it becomes an important problem to predict crowd density and crowd count accurately. Recent CNN-based density estimation methods have shown effectiveness in densely populated scenes. In this paper, we present a novel approach to crowd counting through the development of an Encoder-Decoder Multi-Scale Attention Network. Our approach leverages the robust U-net architecture as the backbone network, strengthened by the strategic integration of an attention mechanism. We adopt a multi-scale attention method to each different layers in the U-net backbone to make the network extract features which focus on the crowds, instead of the background in the images. The attention mechanism and the skip-connections can adjust the weights of feature maps while maintaining features at different scales. Extensive experiments on ShanghaiTech Part_A & B and UCF-QNRF dataset demonstrate that our network can achieve better performances with Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values outperforming existing methodologies: ShanghaiTech Part_A (MAE/RMSE: 60.0/104.9), Part_B (MAE/RMSE: 7.8/13.8), and UCF-QNRF (MAE/RMSE: 98.6/179.7). © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

引用

页码：1187 / 1210

页数：23

共 50 条

[41] Double multi-scale feature fusion network for crowd counting [J].

Liu, Qian ;

Fang, Jiongtao ;

Zhong, Yixiong ;

Wang, Cunbao ;

Qi, Youwei .

MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (34) :81831-81855

[42] An Intelligent Recognition Method for Low-Grade Fault Based on Attention Mechanism and Encoder-Decoder Network Structure [J].

Zhang, Yujie ;

Wang, Dongdong ;

Ding, Renwei ;

Yang, Jing ;

Zhao, Lihong ;

Zhao, Shuo ;

Cai, Minghao ;

Han, Tianjiao .

ENERGIES, 2022, 15 (21)

[43] Tool health monitoring and prediction via attention-based encoder-decoder with a multi-step mechanism [J].

Guo, Baosu ;

Zhang, Qin ;

Peng, Qinjing ;

Zhuang, Jichao ;

Wu, Fenghe ;

Zhang, Quan .

INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 122 (02) :685-695

[44] Tool health monitoring and prediction via attention-based encoder-decoder with a multi-step mechanism [J].

Baosu Guo ;

Qin Zhang ;

Qinjing Peng ;

Jichao Zhuang ;

Fenghe Wu ;

Quan Zhang .

The International Journal of Advanced Manufacturing Technology, 2022, 122 :685-695

[45] Aircraft Bleed Air System Fault Prediction based on Encoder-Decoder with Attention Mechanism [J].

Su, Siyu ;

Sun, Youchao ;

Peng, Chong ;

Wang, Yifan .

EKSPLOATACJA I NIEZAWODNOSC-MAINTENANCE AND RELIABILITY, 2023, 25 (03)

[46] A dual-stream encoder-decoder network with attention mechanism for saliency detection in video(s) [J].

Kumain, Sandeep Chand ;

Singh, Maheep ;

Awasthi, Lalit Kumar .

SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) :2037-2046

[47] LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING [J].

Lv, Meilei ;

Zhang, Kuncai ;

Zheng, Xiaoyun ;

Yang, W. E., I ;

Lu, Zhe-Ming .

INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (04) :1147-1162

[48] Video Summarization With Attention-Based Encoder-Decoder Networks [J].

Ji, Zhong ;

Xiong, Kailin ;

Pang, Yanwei ;

Li, Xuelong .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (06) :1709-1717

[49] A multi-scale and multi-level feature aggregation network for crowd counting [J].

Zhu, Fushun ;

Yan, Hua ;

Chen, Xinyue ;

Li, Tong ;

Zhang, Zhengyu .

NEUROCOMPUTING, 2021, 423 :46-56

[50] Crowd Counting based on Multi-level Multi-scale Feature [J].

Di Wu ;

Zheyi Fan ;

Shuhan Yi .

Applied Intelligence, 2023, 53 :21891-21901

← 1 2 3 4 5 →