CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization

被引：0

作者：

Mingfang Deng

Huailin Zhao

Ming Gao

机构：

[1] Shanghai Institute of Technology,School of Electrical and Electronic Engineering

来源：

The Visual Computer | 2024年 / 40卷 / 2期

关键词：

Shunted Transformer; Weakly supervised learning; Crowd counting; Crowd localization;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Recent progress in crowd counting and localization methods mainly relies on expensive point-level annotations and convolutional neural networks with limited receptive filed, which hinders their applications in complex real-world scenes. To this end, we present CLFormer, a Transformer-based weakly supervised crowd counting and localization framework. The model extracts global information from the input image using a Transformer and then passes the extracted features to both a regression branch for crowd counting and a localization branch for localization. Initial proposals are produced by the localization branch and filtered via score maps generated from the extracted features, and their centers are used as pseudo-point-level annotations. Through staggered training of the two branches, the quality of pseudo-point-level annotations is improved, and the final localization maps are generated. Experiments on four benchmark datasets (i.e., ShanghaiTech, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd) demonstrate that CLFormer obtains better counting performance than weakly supervised and fully supervised counting networks and comparable localization performance to fully supervised localization networks.

引用

页码：1053 / 1067

页数：14