JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

被引:2
|
作者
Wang, Mingtao [1 ]
Zhou, Xin [1 ]
Chen, Yuanyuan [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
关键词
Crowd counting; Count estimation; Multi-scale variations; Multi-density map supervision; PEOPLE; SCALE; MODEL;
D O I
10.1007/s10115-023-02056-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting based on convolutional neural networks (CNNs) has made significant progress in recent years. However, the limited receptive field of CNNs makes it challenging to capture global features for comprehensive contextual modeling, resulting in insufficient accuracy in count estimation. In comparison, vision transformer (ViT)-based counting networks have demonstrated remarkable performance by exploiting their powerful global contextual modeling capabilities. However, ViT models are associated with higher computational costs and training difficulty. In this paper, we propose a novel network named JMFEEL-Net, which utilizes joint multi-scale feature enhancement and lightweight transformer to improve crowd counting accuracy. Specifically, we use a high-resolution CNN as the backbone network to generate high-resolution feature maps. In the backend network, we propose a multi-scale feature enhancement module to address the problem of low recognition accuracy caused by multi-scale variations, especially when counting small-scale objects in dense scenes. Furthermore, we introduce an improved lightweight ViT encoder to effectively model complex global contexts. We also adopt a multi-density map supervision strategy to learn crowd distribution features from feature maps of different resolutions, thereby improving the quality and training efficiency of the density maps. To validate the effectiveness of the proposed method, we conduct extensive experiments on four challenging datasets, namely ShanghaiTech Part A/B, UCF-QNRF, and JHU-Crowd++, achieving very competitive counting performance.
引用
收藏
页码:3033 / 3053
页数:21
相关论文
共 50 条
  • [11] Multi-scale supervised network for crowd counting
    Wang, Yongjie
    Zhang, Wei
    Huang, Dongxiao
    Liu, Yanyan
    Zhu, Jianghua
    IET IMAGE PROCESSING, 2020, 14 (17) : 4701 - 4707
  • [12] MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting
    Li, Zhaoxin
    Lu, Shuhua
    Dong, Yishan
    Guo, Jingyuan
    VISUAL COMPUTER, 2023, 39 (03): : 1045 - 1056
  • [13] MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting
    Zhaoxin Li
    Shuhua Lu
    Yishan Dong
    Jingyuan Guo
    The Visual Computer, 2023, 39 : 1045 - 1056
  • [14] Deep feature network with multi-scale fusion for highly congested crowd counting
    Leilei Yan
    Li Zhang
    Xiaohan Zheng
    Fanzhang Li
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 819 - 835
  • [15] MSFFNet: multi-scale feature fusion network with semantic optimization for crowd counting
    Rohra, Avinash
    Yin, Baoqun
    Bilal, Hazrat
    Kumar, Aakash
    Ali, Munawar
    Li, Yang
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
  • [16] Deep feature network with multi-scale fusion for highly congested crowd counting
    Yan, Leilei
    Zhang, Li
    Zheng, Xiaohan
    Li, Fanzhang
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (03) : 819 - 835
  • [17] LigMSANet: Lightweight multi-scale adaptive convolutional neural network for dense crowd counting
    Jiang, Guoquan
    Wu, Rui
    Huo, Zhanqiang
    Zhao, Cuijun
    Luo, Junwei
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 197
  • [18] MLANet: multi-level attention network with multi-scale feature fusion for crowd counting
    Xiong, Liyan
    Zeng, Yijuan
    Huang, Xiaohui
    Li, Zhida
    Huang, Peng
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 6591 - 6608
  • [19] Redesigning Multi-Scale Neural Network for Crowd Counting
    Du, Zhipeng
    Shi, Miaojing
    Deng, Jiankang
    Zafeiriou, Stefanos
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3664 - 3678
  • [20] Crowd Counting based on Multi-level Multi-scale Feature
    Di Wu
    Zheyi Fan
    Shuhan Yi
    Applied Intelligence, 2023, 53 : 21891 - 21901