TransCrowd: weakly-supervised crowd counting with transformers

被引:118
作者
Liang, Dingkang [1 ]
Chen, Xiwu [2 ]
Xu, Wei [3 ]
Zhou, Yu [2 ]
Bai, Xiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
[3] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
基金
国家重点研发计划;
关键词
crowd counting; visual transformer; weakly supervised; crowd analysis; transformer; SCALE;
D O I
10.1007/s11432-021-3445-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.
引用
收藏
页数:14
相关论文
共 70 条
  • [1] Abousamra S, 2021, P AAAI C ART INT 202
  • [2] Adaptive Dilated Network with Self-Correction Supervision for Counting
    Bai, Shuai
    He, Zhiqun
    Qiao, Yu
    Hu, Hanzhe
    Wu, Wei
    Yan, Junjie
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4593 - 4602
  • [3] Scale Aggregation Network for Accurate and Efficient Crowd Counting
    Cao, Xinkun
    Wang, Zhipeng
    Zhao, Yanyun
    Su, Fei
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 757 - 773
  • [4] Carion N, 2020, P EUR C COMP VIS 202
  • [5] Privacy preserving crowd monitoring: Counting people without people models or tracking
    Chan, Antoni B.
    Liang, Zhang-Sheng John
    Vasconcelos, Nuno
    [J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1766 - 1772
  • [6] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [7] Cell Localization and Counting Using Direction Field Map
    Chen, Yajie
    Liang, Dingkang
    Bai, Xiang
    Xu, Yongchao
    Yang, Xin
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (01) : 359 - 368
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Dosovitskiy A., 2021, P INT C LEARN REPR, P11929
  • [10] VisDrone-CC2021: The Vision Meets Drone Crowd Counting Challenge Results
    Liu, Zhihao
    He, Zhijian
    Wang, Lujia
    Wang, Wenguan
    Yuan, Yixuan
    Zhang, Dingwen
    Zhang, Jinglin
    Zhu, Pengfei
    Van Gool, Luc
    Han, Junwei
    Hoi, Steven
    Hu, Qinghua
    Liu, Ming
    Pan, Junwen
    Yin, Baoqun
    Zhang, Binyu
    Liu, Chengxin
    Ding, Ding
    Liang, Dingkang
    Ding, Guanchen
    Lu, Hao
    Lin, Hui
    Chen, Jingyuan
    Li, Jiong
    Liu, Liang
    Zhou, Lin
    Shi, Min
    Yang, Qianqian
    He, Qing
    Peng, Sifan
    Xu, Wei
    Han, Wenwei
    Bai, Xiang
    Chen, Xiwu
    Wang, Yabin
    Xia, Yinfeng
    Tao, Yiran
    Chen, Zhenzhong
    Cao, Zhiguo
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2830 - 2838