MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting

被引:0
作者
Zhaoxin Li
Shuhua Lu
Yishan Dong
Jingyuan Guo
机构
[1] People’s Public Security University of China,College of Information and Cyber Security
来源
The Visual Computer | 2023年 / 39卷
关键词
Crowd counting; Multi-scale; Attention mechanism; Mixed loss function;
D O I
暂无
中图分类号
学科分类号
摘要
Crowd counting has been a growing hot topic in the computer vision community in recent years due to its extensive applications in the fields of public safety and commercial planning. However, up to now, it has been still a challenging task in realistic scenes owing to large-scale variations and complex background interference. In this paper, we have proposed an efficient end-to-end Multi-Scale Feature Fusion and Attention mechanism CNN network, named as MSFFA. The presented network consists of three parts: the front-end of the low-level feature extractor, the mid-end of the multi-scale feature fusion operator and the back-end of the density map generator. Among them, most significantly, in the mid-end, we stack three MSFF blocks with the residual connection, which on the one hand, makes the network obtain large-scale continuous variations and on the other hand, enhances the information transmission. Meanwhile, a global attention mechanism module is employed to extract effective features in complex background scenes. Our method has been evaluated on three public datasets, including ShanghaiTech, UCF-QNRF and UCF_CC_50. Experimental results show that our method outperforms some existing advanced approaches, indicating its excellent accuracy and stability.
引用
收藏
页码:1045 / 1056
页数:11
相关论文
共 70 条
  • [1] Zhu M(2020)Attentive multi-stage convolutional neural network for crowd counting Pat. Recognit. Lett. 135 279-285
  • [2] Wang X(2019)ST-CNN: Spatial-Temporal convolutional neural network for crowd counting in videos Pat. Recognit. Lett. 125 113-118
  • [3] Tang J(2021)Scale and density invariant head detection deep model for crowd counting in pedestrian crowds Vis. Comput. 37 2127-2137
  • [4] Wang N(2019)SCAR: Spatial-/channel-wise attention regression networks for crowd counting Neurocomputing 363 1-8
  • [5] Qu L(2011)Pedestrian detection: An evaluation of the state of the art Mach IEEE Trans. Pat. Anal. Mach. Intell. 34 743-761
  • [6] Miao Y(2007)Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors Int. J. Comput. Vision 75 247-266
  • [7] Han J(2021)A multi-scale and multi-level feature aggregation network for crowd counting Neurocomputing 423 46-56
  • [8] Gao Y(2019)PCC-net: Perspective crowd counting via spatial convolutional network IEEE T. Circ. Syst Vid. 30 3486-3498
  • [9] Zhang B(2019)Multi-resolution attention convolutional neural network for crowd counting Neurocomputing 329 144-152
  • [10] Khan SD(2020)SCLNet: Spatial context learning network for congested crowd counting Neurocomputing 404 227-239