CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

被引:1214
作者
Li, Yuhong [1 ,2 ]
Zhang, Xiaofan [1 ]
Chen, Deming [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
[2] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
关键词
D O I
10.1109/CVPR.2018.00120
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF CC 50 dataset, the WorldEXPO' 10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-theart method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the- art approach.
引用
收藏
页码:1091 / 1100
页数:10
相关论文
共 47 条
[21]   Caffe: Convolutional Architecture for Fast Feature Embedding [J].
Jia, Yangqing ;
Shelhamer, Evan ;
Donahue, Jeff ;
Karayev, Sergey ;
Long, Jonathan ;
Girshick, Ross ;
Guadarrama, Sergio ;
Darrell, Trevor .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678
[22]  
Krizhevsky A., 2017, COMMUN ACM, V60, P84, DOI [DOI 10.1145/3065386, 10.1145/3065386]
[23]  
Lempitsky V, 2010, ADV NEURAL INF PROCE, V23
[24]   Crowded Scene Analysis: A Survey [J].
Li, Teng ;
Chang, Huan ;
Wang, Meng ;
Ni, Bingbing ;
Hong, Richang ;
Yan, Shuicheng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (03) :367-386
[25]  
Lowe D. G., 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision, P1150, DOI 10.1109/ICCV.1999.790410
[26]  
Loy C.C., 2013, Modeling, Simulation and Visual Analysis of Crowds: A Multidisciplinary Perspective, V11, P347, DOI [10.1007/978-1-4614-8483-7_14, 10.1007/978- 1- 4614- 8483]
[27]   Learning Deconvolution Network for Semantic Segmentation [J].
Noh, Hyeonwoo ;
Hong, Seunghoon ;
Han, Bohyung .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1520-1528
[28]   Towards Perspective-Free Object Counting with Deep Learning [J].
Onoro-Rubio, Daniel ;
Lopez-Sastre, Roberto J. .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :615-629
[29]   Shallow and Deep Convolutional Networks for Saliency Prediction [J].
Pan, Junting ;
Sayrol, Elisa ;
Giro-I-Nieto, Xavier ;
McGuinness, Kevin ;
O'Connor, Noel E. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :598-606
[30]   Going Deeper with Embedded FPGA Platform for Convolutional Neural Network [J].
Qiu, Jiantao ;
Wang, Jie ;
Yao, Song ;
Guo, Kaiyuan ;
Li, Boxun ;
Zhou, Erjin ;
Yu, Jincheng ;
Tang, Tianqi ;
Xu, Ningyi ;
Song, Sen ;
Wang, Yu ;
Yang, Huazhong .
PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, :26-35