Self-attention Guidance Based Crowd Localization and Counting

被引：1

作者：

Ma, Zhouzhou ^{[1
,2
]}

Gu, Guanghua ^{[1
,2
]}

Zhao, Wenrui ^{[1
,2
]}

机构：

[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao 066000, Peoples R China

[2] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao 066000, Peoples R China

来源：

MACHINE INTELLIGENCE RESEARCH | 2024年 / 21卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Crowd localization; crowd counting; transformer; point supervision; object detection; IMAGE; NETWORK;

D O I：

10.1007/s11633-023-1428-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most existing studies on crowd analysis are limited to the level of counting, which cannot provide the exact location of individuals. This paper proposes a self-attention guidance based crowd localization and counting network (SA-CLCN), which can simultaneously locate and count crowds. We take the form of object detection, using the original point annotations of crowd datasets as supervision to train the network. Ultimately, the center point coordinate of each head as well as the number of crowds are predicted. Specifically, to cope with the spatial and positional variations of the crowd, the proposed method introduces transformer to construct a globallocal feature extractor (GLFE) together with the convolutional structure. It establishes the near-to-far dependency between elements so that the global context and local detail features of the crowd image can be extracted simultaneously. Then, this paper designs a pyramid feature fusion module (PFFM) to fuse the global and local information from high level to low level to obtain a multiscale feature representation. In downstream tasks, this paper predicts candidate point offsets and confidence scores by a simple regression header and classification header. In addition, the Hungarian algorithm is used to match the predicted point set and the labelled point set to facilitate the calculation of losses. The proposed network avoids the errors or higher costs associated with using traditional density maps or bounding box annotations. Importantly, we have conducted extensive experiments on several crowd datasets, and the proposed method has produced competitive results in both counting and localization.

引用

页码：966 / 982

页数：17

共 50 条

[21] Attention to Head Locations for Crowd Counting
Zhang, Youmei
Zhou, Chunluan
Chang, Faliang
Kot, Alex C.
Zhang, Wei
IMAGE AND GRAPHICS, ICIG 2019, PT II, 2019, 11902 : 727 - 737
[22] Retinal Vessel Segmentation Based on Self-Attention Feature Selection
Jiang, Ligang
Li, Wen
Xiong, Zhiming
Yuan, Guohui
Huang, Chongjun
Xu, Wenhao
Zhou, Lu
Qu, Chao
Wang, Zhuoran
Tong, Yuhua
ELECTRONICS, 2024, 13 (17)
[23] Crowd Counting Guided by Attention Network
Nie, Pei
Fan, Cien
Zou, Lian
Chen, Liqiong
Li, Xiaopeng
INFORMATION, 2020, 11 (12) : 1 - 10
[24] Spatiotemporal module for video saliency prediction based on self-attention
Wang, Yuhao
Liu, Zhuoran
Xia, Yibo
Zhu, Chunbo
Zhao, Danpei
IMAGE AND VISION COMPUTING, 2021, 112
[25] NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization
Wang, Qi
Gao, Junyu
Lin, Wei
Li, Xuelong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (06) : 2141 - 2149
[26] MULTISCALE CROWD COUNTING AND LOCALIZATION BY MULTITASK POINT SUPERVISION
Zand, Mohsen
Damirchi, Haleh
Farley, Andrew
Molahasani, Mahdiyar
Greenspan, Michael
Etemad, Ali
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1820 - 1824
[27] Dense Crowd Counting Algorithm Based on New Multi-scale Attention Mechanism
Wan Honglin
Wang Xiaomin
Peng Zhenwei
Bai Zhiquan
Yang Xinghai
Sun Jiande
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (03) : 1129 - 1136
[28] Session-Based Recommendation with Self-Attention
Anh, Pharr Hoang
Bach, Ngo Xuan
Phuong, Tu Minh
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 1 - 8
[29] CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization
Mingfang Deng
Huailin Zhao
Ming Gao
The Visual Computer, 2024, 40 (2) : 1053 - 1067
[30] CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization
Deng, Mingfang
Zhao, Huailin
Gao, Ming
VISUAL COMPUTER, 2024, 40 (02) : 1053 - 1067

← 1 2 3 4 5 →