TransCrowd: weakly-supervised crowd counting with transformers

被引：118

作者：

Liang, Dingkang ^{[1
]}

Chen, Xiwu ^{[2
]}

Xu, Wei ^{[3
]}

Zhou, Yu ^{[2
]}

Bai, Xiang ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China

[3] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2022年 / 65卷 / 06期

基金：

国家重点研发计划;

关键词：

crowd counting; visual transformer; weakly supervised; crowd analysis; transformer; SCALE;

D O I：

10.1007/s11432-021-3445-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus cannot achieve satisfactory performance, with limited applications in the real world. The transformer is a popular sequence-to-sequence prediction model in natural language processing (NLP), which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on transformers. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of transformer. To the best of our knowledge, this is the first work to adopt a pure transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods.

引用

页数：14

共 70 条

[1] Abousamra S, 2021, P AAAI C ART INT 202
[2] Adaptive Dilated Network with Self-Correction Supervision for Counting
Bai, Shuai
He, Zhiqun
Qiao, Yu
Hu, Hanzhe
Wu, Wei
Yan, Junjie
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4593 - 4602
[3] Scale Aggregation Network for Accurate and Efficient Crowd Counting
Cao, Xinkun
Wang, Zhipeng
Zhao, Yanyun
Su, Fei
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 757 - 773
[4] Carion N, 2020, P EUR C COMP VIS 202
[5] Privacy preserving crowd monitoring: Counting people without people models or tracking
Chan, Antoni B.
Liang, Zhang-Sheng John
Vasconcelos, Nuno
[J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1766 - 1772
[6] Pre-Trained Image Processing Transformer
Chen, Hanting
Wang, Yunhe
Guo, Tianyu
Xu, Chang
Deng, Yiping
Liu, Zhenhua
Ma, Siwei
Xu, Chunjing
Xu, Chao
Gao, Wen
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
[7] Cell Localization and Counting Using Direction Field Map
Chen, Yajie
Liang, Dingkang
Bai, Xiang
Xu, Yongchao
Yang, Xin
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (01) : 359 - 368
[8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9] Dosovitskiy A., 2021, P INT C LEARN REPR, P11929
[10] VisDrone-CC2021: The Vision Meets Drone Crowd Counting Challenge Results
Liu, Zhihao
He, Zhijian
Wang, Lujia
Wang, Wenguan
Yuan, Yixuan
Zhang, Dingwen
Zhang, Jinglin
Zhu, Pengfei
Van Gool, Luc
Han, Junwei
Hoi, Steven
Hu, Qinghua
Liu, Ming
Pan, Junwen
Yin, Baoqun
Zhang, Binyu
Liu, Chengxin
Ding, Ding
Liang, Dingkang
Ding, Guanchen
Lu, Hao
Lin, Hui
Chen, Jingyuan
Li, Jiong
Liu, Liang
Zhou, Lin
Shi, Min
Yang, Qianqian
He, Qing
Peng, Sifan
Xu, Wei
Han, Wenwei
Bai, Xiang
Chen, Xiwu
Wang, Yabin
Xia, Yinfeng
Tao, Yiran
Chen, Zhenzhong
Cao, Zhiguo
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2830 - 2838

← 1 2 3 4 5 6 7 →