JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

被引:2
|
作者
Wang, Mingtao [1 ]
Zhou, Xin [1 ]
Chen, Yuanyuan [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
关键词
Crowd counting; Count estimation; Multi-scale variations; Multi-density map supervision; PEOPLE; SCALE; MODEL;
D O I
10.1007/s10115-023-02056-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting based on convolutional neural networks (CNNs) has made significant progress in recent years. However, the limited receptive field of CNNs makes it challenging to capture global features for comprehensive contextual modeling, resulting in insufficient accuracy in count estimation. In comparison, vision transformer (ViT)-based counting networks have demonstrated remarkable performance by exploiting their powerful global contextual modeling capabilities. However, ViT models are associated with higher computational costs and training difficulty. In this paper, we propose a novel network named JMFEEL-Net, which utilizes joint multi-scale feature enhancement and lightweight transformer to improve crowd counting accuracy. Specifically, we use a high-resolution CNN as the backbone network to generate high-resolution feature maps. In the backend network, we propose a multi-scale feature enhancement module to address the problem of low recognition accuracy caused by multi-scale variations, especially when counting small-scale objects in dense scenes. Furthermore, we introduce an improved lightweight ViT encoder to effectively model complex global contexts. We also adopt a multi-density map supervision strategy to learn crowd distribution features from feature maps of different resolutions, thereby improving the quality and training efficiency of the density maps. To validate the effectiveness of the proposed method, we conduct extensive experiments on four challenging datasets, namely ShanghaiTech Part A/B, UCF-QNRF, and JHU-Crowd++, achieving very competitive counting performance.
引用
收藏
页码:3033 / 3053
页数:21
相关论文
共 50 条
  • [31] Dense Crowd Counting Network Based on Multi-scale Perception
    Li, Hengchao
    Liu, Xianglian
    Liu, Peng
    Feng, Bin
    Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2024, 59 (05): : 1176 - 1183
  • [32] MHANet: Multi-scale hybrid attention network for crowd counting
    Yu, Ying
    Yu, Jiamao
    Qian, Jin
    Zhu, Zhiliang
    Han, Xing
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 9445 - 9455
  • [33] MSIANet: Multi-scale Interactive Attention Crowd Counting Network
    Zhang, Shihui
    Zhao, Weibo
    Wang, Lei
    Wang, Wei
    Li, Qunpeng
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (06) : 2236 - 2245
  • [34] Multi-Scale Network with Integrated Attention Unit for Crowd Counting
    Hafeezallah, Adel
    Al-Dhamari, Ahlam
    Abu-Bakar, Syed Abd Rahman
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 3879 - 3903
  • [35] COMAL: compositional multi-scale feature enhanced learning for crowd counting
    Zhou, Fangbo
    Zhao, Huailin
    Zhang, Yani
    Zhang, Qing
    Liang, Lanjun
    Li, Yaoyao
    Duan, Zuodong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (15) : 20541 - 20560
  • [36] PFEL-Net: A lightweight network to enhance feature for multi-scale pedestrian detection
    Tang, Jingwen
    Lai, Huicheng
    Gao, Guxue
    Wang, Tongguan
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (08)
  • [37] COMAL: compositional multi-scale feature enhanced learning for crowd counting
    Fangbo Zhou
    Huailin Zhao
    Yani Zhang
    Qing Zhang
    Lanjun Liang
    Yaoyao Li
    Zuodong Duan
    Multimedia Tools and Applications, 2022, 81 : 20541 - 20560
  • [38] LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING
    Lv, Meilei
    Zhang, Kuncai
    Zheng, Xiaoyun
    Yang, W. E., I
    Lu, Zhe-Ming
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (04): : 1147 - 1162
  • [39] MSF-Net: A Lightweight Multi-Scale Feature Fusion Network for Skin Lesion Segmentation
    Shao, Dangguo
    Ren, Lifan
    Ma, Lei
    BIOMEDICINES, 2023, 11 (06)
  • [40] LMFR-Net: lightweight multi-scale feature refinement network for retinal vessel segmentation
    Zhang, Wenhao
    Qu, Shaojun
    Feng, Yuewen
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)