Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

被引：580

作者：

Idrees, Haroon ^{[1
]}

Tayyab, Muhmmad ^{[5
]}

Athrey, Kishan ^{[5
]}

Zhang, Dong ^{[2
]}

Al-Maadeed, Somaya ^{[3
]}

Rajpoot, Nasir ^{[4
]}

Shah, Mubarak ^{[5
]}

机构：

[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA

[2] NVIDIA Inc, Santa Clara, CA USA

[3] Qatar Univ, Fac Engn, Comp Sci Dept, Doha, Qatar

[4] Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England

[5] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32816 USA

来源：

COMPUTER VISION - ECCV 2018, PT II | 2018年 / 11206卷

关键词：

Crowd counting; Localization; Composition loss; Convolution Neural Networks;

D O I：

10.1007/978-3-030-01216-8_33

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision. In particular, counting in highly dense crowds is a challenging problem with far-reaching applicability in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this paper, we propose a novel approach that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image. Our formulation is based on an important observation that the three problems are inherently related to each other making the loss function for optimizing a deep CNN decomposable. Since localization requires high-quality images and annotations, we introduce UCF-QNRF dataset that overcomes the shortcomings of previous datasets, and contains 1.25 million humans manually marked with dot annotations. Finally, we present evaluation measures and comparison with recent deep CNNs, including those developed specifically for crowd counting. Our approach significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.

引用

页码：544 / 559

页数：16

共 30 条

[1]

[Anonymous], 2015, CVPR

[2]

[Anonymous], 2013, P IEEE C COMP VIS PA

[3]

[Anonymous], 2015, ARXIV151100561

[4]

[Anonymous], 2006, GUARDIAN

[5]

[Anonymous], 2015, P IEEE INT C COMP VI

[6]

[Anonymous], 2017, IEEE C COMPUTER VISI, DOI DOI 10.1109/CVPR.2017.243

[7]

[Anonymous], 2015, P 23 ACM INT C MULT

[8]

[Anonymous], 2010, P ADV NEUR INF PROC

[9]

[Anonymous], 2011, ICCV

[10]

[Anonymous], 2009, 2009 DIG IM COMP TEC

← 1 2 3 →