Attend to count: Crowd counting with adaptive capacity multi-scale CNNs

被引：45

作者：

Zou, Zhikang ^{[1
]}

Cheng, Yu ^{[2
]}

Qu, Xiaoye ^{[1
]}

Ji, Shouling ^{[3
]}

Guo, Xiaoxiao ^{[4
]}

Zhou, Pan ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Hubei, Peoples R China

[2] Microsoft Res & AI, Beijing, Peoples R China

[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China

[4] IBM & AI Fdn Learning, Beijing, Peoples R China

来源：

NEUROCOMPUTING | 2019年 / 367卷

关键词：

Crowd counting; Attention mechanism; Multi-scale CNNs; Adaptive capacity;

D O I：

10.1016/j.neucom.2019.08.009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Crowd counting is a challenging task due to the large variations in crowd distributions. Previous methods tend to tackle the whole image with a single fixed structure, which is unable to handle diverse complicated scenes with different crowd densities. Hence, we propose the Adaptive Capacity Multi-scale convolutional neural networks (ACM-CNN), a novel crowd counting approach which can assign different capacities to different portions of the input. The intuition is that the model should focus on important regions of the input image and optimize its capacity allocation conditioning on the crowd intensive degree. ACM-CNN consists of three types of modules: A coarse network, a fine network, and a smooth network. The coarse network is used to explore the areas that need to be focused via count attention mechanism, and generate a rough feature map. Then the fine network processes the areas of interest into a fine feature map. To alleviate the sense of division caused by fusion, the smooth network is designed to combine two feature maps organically to produce high-quality density maps. Extensive experiments are conducted on five mainstream datasets. The results demonstrate the effectiveness of the proposed model for both density estimation and crowd counting tasks. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：75 / 83

页数：9

共 40 条

[11] Dynamic attention priors: a new and efficient concept for improving object detection
Gepperth, Alexander R. T.
Ortiz, Michael Garcia
Sattarov, Egor
Heisele, Bernd
[J]. NEUROCOMPUTING, 2016, 197 : 14 - 28
[12] Crowd Counting Using Scale-Aware Attention Networks
Hossain, Mohammad Asiful
Hosseinzadeh, Mehrdad
Chanda, Omit
Wang, Yang
[J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1280 - 1288
[13] Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images
Idrees, Haroon
Saleemi, Imran
Seibert, Cody
Shah, Mubarak
[J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2547 - 2554
[14] Salient object detection via multi-scale attention CNN
Ji, Yuzhu
Zhang, Haijun
Wu, Q. M. Jonathan
[J]. NEUROCOMPUTING, 2018, 322 : 130 - 140
[15] Jiang X., 2019, ABS190300853 CORR
[16] Detecting and counting people using real-time directional algorithms implemented by compute unified device architecture
Kocak, Yasemin Poyraz
Sevgen, Selcuk
[J]. NEUROCOMPUTING, 2017, 248 : 105 - 111
[17] DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations
Kruthiventi, Srinivas S. S.
Ayush, Kumar
Babu, R. Venkatesh
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (09) : 4446 - 4456
[18] Kumagai S., 2017, ABS170309393 CORR
[19] Visual Question Generation as Dual Task of Visual Question Answering
Li, Yikang
Duan, Nan
Zhou, Bolei
Chu, Xiao
Ouyang, Wanli
Wang, Xiaogang
Zhou, Ming
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
[20] Erase or Fill? Deep Joint Recurrent Rain Removal and Reconstruction in Videos
Liu, Jiaying
Yang, Wenhan
Yang, Shuai
Guo, Zongming
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3233 - 3242

← 1 2 3 4 →