TransCrowd: weakly-supervised crowd counting with transformers

被引:0
作者
Dingkang Liang
Xiwu Chen
Wei Xu
Yu Zhou
Xiang Bai
机构
[1] Huazhong University of Science and Technology,School of Artificial Intelligence and Automation
[2] Huazhong University of Science and Technology,School of Electronic Information and Communication
[3] Beijing University of Posts and Telecommunications,School of Artificial Intelligence
来源
Science China Information Sciences | 2022年 / 65卷
关键词
crowd counting; visual transformer; weakly supervised; crowd analysis; transformer;
D O I
暂无
中图分类号
学科分类号
摘要
The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.
引用
收藏
相关论文
共 36 条
  • [11] Weijer J(2022)Kernel-based density map generation for dense object counting IEEE Trans Pattern Anal Mach Intell 44 1357-2751
  • [12] Bagdanov A D(2021)Locate, size and count: accurately resolving people in dense crowds via detection IEEE Trans Pattern Anal Mach Intell 43 2739-undefined
  • [13] Xu C(undefined)undefined undefined undefined undefined-undefined
  • [14] Liang D(undefined)undefined undefined undefined undefined-undefined
  • [15] Xu Y(undefined)undefined undefined undefined undefined-undefined
  • [16] Chen Y(undefined)undefined undefined undefined undefined-undefined
  • [17] Liang D(undefined)undefined undefined undefined undefined-undefined
  • [18] Bai X(undefined)undefined undefined undefined undefined-undefined
  • [19] Xu W(undefined)undefined undefined undefined undefined-undefined
  • [20] Liang D(undefined)undefined undefined undefined undefined-undefined