TransCrowd: weakly-supervised crowd counting with transformers

被引：0

作者：

Dingkang Liang

Xiwu Chen

Wei Xu

Yu Zhou

Xiang Bai

机构：

[1] Huazhong University of Science and Technology,School of Artificial Intelligence and Automation

[2] Huazhong University of Science and Technology,School of Electronic Information and Communication

[3] Beijing University of Posts and Telecommunications,School of Artificial Intelligence

来源：

Science China Information Sciences | 2022年 / 65卷

关键词：

crowd counting; visual transformer; weakly supervised; crowd analysis; transformer;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.

引用

共 36 条

[1] Guo B(2015)Mobile crowd sensing and computing ACM Comput Surv 48 1-31
[2] Wang Z(2014)Leveraging GPS-less sensing scheduling for green mobile crowd sensing IEEE Internet Things J 1 328-336
[3] Yu Z(2021)Towards using count-level weak supervision for crowd counting Pattern Recogn 109 107616-1878
[4] Sheng X(2019)Exploiting unlabeled data in CNNs by self-supervised learning to rank IEEE Trans Pattern Anal Mach Intell 41 1862-434
[5] Tang J(2022)AutoScale: learning to scale for crowd counting Int J Comput Vis 130 405-368
[6] Xiao X J(2022)Cell localization and counting using direction field map IEEE J Biomed Health Inform 26 359-1574
[7] Lei Y(2021)Dilated-scale-aware category-attention ConvNet for multi-class object counting IEEE Signal Process Lett 28 1570-3498
[8] Liu Y(2020)PCC Net: perspective crowd counting via spatial convolutional network IEEE Trans Circuits Syst Video Technol 30 3486-2149
[9] Zhang P(2021)NWPU-Crowd: a large-scale benchmark for crowd counting and localization IEEE Trans Pattern Anal Mach Intell 43 2141-2609
[10] Liu X(2022)JHU-CROWD++: large-scale crowd counting dataset and a benchmark method IEEE Trans Pattern Anal Mach Intell 44 2594-1370

← 1 2 3 4 →