Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework

被引：230

作者：

Song, Qingyu ^{[1
]}

Wang, Changan ^{[1
]}

Jiang, Zhengkai ^{[1
]}

Wang, Yabiao ^{[1
]}

Tai, Ying ^{[1
]}

Wang, Chengjie ^{[1
]}

Li, Jilin ^{[1
]}

Huang, Feiyue ^{[1
]}

Wu, Yang ^{[2
]}

机构：

[1] Tencent Youtu Lab, Shanghai, Peoples R China

[2] Tencent PCG, Appl Res Ctr ARC, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00335

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Localizing individuals in crowds is more in accordance with the practical demands of subsequent high-level crowd analysis tasks than simply counting. However, existing localization based methods relying on intermediate representations (i.e., density maps or pseudo boxes) serving as learning targets are counter-intuitive and error-prone. In this paper, we propose a purely point-based framework for joint crowd counting and individual localization. For this framework, instead of merely reporting the absolute counting error at image level, we propose a new metric, called density Normalized Average Precision (nAP), to provide more comprehensive and more precise performance evaluation. Moreover, we design an intuitive solution under this framework, which is called Point to Point Network (P2PNet). P2PNet discards superfluous steps and directly predicts a set of point proposals to represent heads in an image, being consistent with the human annotation results. By thorough analysis, we reveal the key step towards implementing such a novel idea is to assign optimal learning targets for these proposals. Therefore, we propose to conduct this crucial association in an one-to-one matching manner using the Hungarian algorithm. The P2PNet not only significantly surpasses state-of-the-art methods on popular counting benchmarks, but also achieves promising localization accuracy. The codes will be available at: TencentYoutuResearch/CrowdCounting-P2PNet.

引用

页码：3345 / 3354

页数：10

共 42 条

[1]

Bai Shuai, 2020, IEEE C COMP VIS PATT

[2] Polymer hollow fiber membranes for gas separation: A comparison between three commercial resins [J].

Chen, Xiao Yuan ;

Kaliaguine, Serge ;

Rodrigue, Denis .

PROCEEDINGS OF 33RD INTERNATIONAL CONFERENCE OF THE POLYMER PROCESSING SOCIETY (PPS-33), 2019, 2139

[3]

Dupont Camille, 2017, IEEE C COMP VIS PATT

[4] JPEG Artifacts Reduction via Deep Convolutional Sparse Coding [J].

Fu, Xueyang ;

Zha, Zheng-Jun ;

Wu, Feng ;

Ding, Xinghao ;

Paisley, John .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2501-2510

[5]

Guerrero-Gomez- Ricardo, 2015, IB C PATT REC IM AN

[6] Angle-Based Search Space Shrinking for Neural Architecture Search [J].

Hu, Yiming ;

Liang, Yuding ;

Guo, Zichao ;

Wan, Ruosi ;

Zhang, Xiangyu ;

Wei, Yichen ;

Gu, Qingyi ;

Sun, Jian .

COMPUTER VISION - ECCV 2020, PT XIX, 2020, 12364 :119-134

[7] Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds [J].

Idrees, Haroon ;

Tayyab, Muhmmad ;

Athrey, Kishan ;

Zhang, Dong ;

Al-Maadeed, Somaya ;

Rajpoot, Nasir ;

Shah, Mubarak .

COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 :544-559

[8] Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images [J].

Idrees, Haroon ;

Saleemi, Imran ;

Seibert, Cody ;

Shah, Mubarak .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2547-2554

[9]

Jiang Xiaoheng, 2020, IEEE C COMP VIS PATT

[10]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

← 1 2 3 4 5 →