GTL-ASENet: global to local adaptive spatial encoder network for crowd counting

被引：0

作者：

Liu, Chengming ^{[1
]}

Hu, Guanzhong ^{[1
]}

Li, Yinghao ^{[1
]}

Gao, Yufei ^{[1
]}

Shi, Lei ^{[1
]}

机构：

[1] Zhengzhou Univ, Sch Cyber Sci & Engn, 97 Wenhua St, Zhengzhou 450002, Henan, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 83卷 / 22期

关键词：

Crowd counting; Density map; Spatial encoder; Global distribution; Contextual module; SCALE; PEOPLE;

D O I：

10.1007/s11042-023-14330-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Crowd counting from a single image is a challenging task due to perspective distortion and large-scale variation in crowd scenes. Many Researches only focus on local features to create density maps which is not effective in handing the challenges. This paper proposes a novel network named global-to-local adaptive spatial encoder network, which focuses on global features to generate a total structure density map of the population distribution, and then utilizes local features to reconstruct the total structure density map in detail to generate high-quality density map. To capture global features, local information and correlate them, we design a contextual module using different kernels with convolution and transposed convolution. To create a density map from global structure to local detail, two branches are designed, the global distribution branch and the local detail branch. The former aims to capture the population distribution region of interest in terms of global structure, and the latter aims to focus on the local details of each unit. Furthermore, to overcome the problem of pixel-wise loss of MSE, this paper proposes an efficient loss function that focuses on perceiving the possible crowd distribution over the whole image. We also apply a new upsampling mechanism that learns to create high-quality density maps on its own is advisable. The proposed network can capture the characteristics of pedestrian distribution and predict accurate results. It is evaluated on four crowd counting datasets (ShanghaiTech, NWPU, UCF_QNRF, UCF_CC_50), it obtains MAE of 67.1 and MSE, and achieves 108.8 in ShanghaiTech and gets MAE of 139.2 and the best MSE of 217.7 in UCF_CC_50 dataset and so on, and our method shows state-of-the-art on all the datasets.

引用

页码：61697 / 61714

页数：18

共 45 条

[1]

[Anonymous], 4 INT C LEARNING REP

[2] CrowdNet: A Deep Convolutional Network for Dense Crowd Counting [J].

Boominathan, Lokesh ;

Kruthiventi, Srinivas S. S. ;

Babu, R. Venkatesh .

MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :640-644

[3] Scale Aggregation Network for Accurate and Efficient Crowd Counting [J].

Cao, Xinkun ;

Wang, Zhipeng ;

Zhao, Yanyun ;

Su, Fei .

COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 :757-773

[4] Privacy preserving crowd monitoring: Counting people without people models or tracking [J].

Chan, Antoni B. ;

Liang, Zhang-Sheng John ;

Vasconcelos, Nuno .

2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, :1766-1772

[5] Bayesian Poisson Regression for Crowd Counting [J].

Chan, Antoni B. ;

Vasconcelos, Nuno .

2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :545-551

[6] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[7] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[8] Monocular Pedestrian Detection: Survey and Experiments [J].

Enzweiler, Markus ;

Gavrila, Dariu M. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (12) :2179-2195

[9] Fast crowd density estimation with convolutional neural networks [J].

Fu, Min ;

Xu, Pei ;

Li, Xudong ;

Liu, Qihe ;

Ye, Mao ;

Zhu, Ce .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 :81-88

[10] SCAR: Spatial-/channel-wise attention regression networks for crowd counting [J].

Gao, Junyu ;

Wang, Qi ;

Yuan, Yuan .

NEUROCOMPUTING, 2019, 363 :1-8

← 1 2 3 4 5 →