CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization

被引:0
作者
Mingfang Deng
Huailin Zhao
Ming Gao
机构
[1] Shanghai Institute of Technology,School of Electrical and Electronic Engineering
关键词
Shunted Transformer; Weakly supervised learning; Crowd counting; Crowd localization;
D O I
暂无
中图分类号
学科分类号
摘要
Recent progress in crowd counting and localization methods mainly relies on expensive point-level annotations and convolutional neural networks with limited receptive filed, which hinders their applications in complex real-world scenes. To this end, we present CLFormer, a Transformer-based weakly supervised crowd counting and localization framework. The model extracts global information from the input image using a Transformer and then passes the extracted features to both a regression branch for crowd counting and a localization branch for localization. Initial proposals are produced by the localization branch and filtered via score maps generated from the extracted features, and their centers are used as pseudo-point-level annotations. Through staggered training of the two branches, the quality of pseudo-point-level annotations is improved, and the final localization maps are generated. Experiments on four benchmark datasets (i.e., ShanghaiTech, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd) demonstrate that CLFormer obtains better counting performance than weakly supervised and fully supervised counting networks and comparable localization performance to fully supervised localization networks.
引用
收藏
页码:1053 / 1067
页数:14
相关论文
共 50 条
[21]   CrowdMLP: Weakly-supervised crowd counting via multi-granularity MLP [J].
Wang, Mingjie ;
Zhou, Jun ;
Cai, Hao ;
Gong, Minglun .
PATTERN RECOGNITION, 2023, 144
[22]   A unified RGB-T crowd counting learning framework [J].
Gu, Siqi ;
Lian, Zhichao .
IMAGE AND VISION COMPUTING, 2023, 131
[23]   Region feature smoothness assumption for weakly semi-supervised crowd counting [J].
Miao, Zhuangzhuang ;
Zhang, Yong ;
Piao, Xinglin ;
Chu, Yi ;
Yin, Baocai .
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
[24]   CLIP-Driven Transformer for Weakly Supervised Object Localization [J].
Chen, Zhiwei ;
Shen, Yunhang ;
Cao, Liujuan ;
Zhang, Shengchuan ;
Ji, Rongrong .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (06) :4878-4896
[25]   A weakly-supervised transformer-based hybrid network with multi-attention for pavement crack detection [J].
Wang, Zhenlin ;
Leng, Zhufei ;
Zhang, Zhixin .
CONSTRUCTION AND BUILDING MATERIALS, 2024, 411
[26]   Self-attention Guidance Based Crowd Localization and Counting [J].
Ma, Zhouzhou ;
Gu, Guanghua ;
Zhao, Wenrui .
MACHINE INTELLIGENCE RESEARCH, 2024, 21 (05) :966-982
[27]   An interactive network based on transformer for multimodal crowd counting [J].
Yu, Ying ;
Cai, Zhen ;
Miao, Duoqian ;
Qian, Jin ;
Tang, Hong .
APPLIED INTELLIGENCE, 2023, 53 (19) :22602-22614
[28]   An interactive network based on transformer for multimodal crowd counting [J].
Ying Yu ;
Zhen Cai ;
Duoqian Miao ;
Jin Qian ;
Hong Tang .
Applied Intelligence, 2023, 53 :22602-22614
[29]   Diffusion-based framework for weakly-supervised temporal action localization [J].
Zou, Yuanbing ;
Zhao, Qingjie ;
Sarker, Prodip Kumar ;
Li, Shanshan ;
Wang, Lei ;
Liu, Wangwang .
Pattern Recognition, 2025, 160
[30]   WEAKLY-SUPERVISED CROWD COUNTING WITH TOKEN ATTENTION AND FUSION: A SIMPLE AND EFFECTIVE BASELINE [J].
Wang, Yi ;
Hu, Qiongyang ;
Chau, Lap-Pui .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :13456-13460