A Multi-Scale Feature Fusion Network With Cascaded Supervision for Cross-Scene Crowd Counting

被引:0
|
作者
Zhang, Xinfeng [1 ]
Han, Lina [1 ]
Shan, Wencong [1 ]
Wang, Xiaohu [1 ]
Chen, Shuhan [1 ]
Zhu, Congcong [1 ]
Li, Bin [1 ]
机构
[1] Yangzhou Univ, Coll Informat Engn, Coll Artificial Intelligence, Jiangsu Prov Engn Res Ctr Knowledge Management & I, Yangzhou 225127, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Training; Image resolution; Location awareness; Annotations; Testing; Training data; Background suppression (BS) loss; cascaded supervision; crowd counting; dilated convolution; multi-scale feature fusion; SCALE;
D O I
10.1109/TIM.2023.3246534
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Counting the number of people in public places has received much attention, and researchers have devoted much effort to the task. However, the existing crowd counting approaches are mainly trained and tested in similar scenarios. The performance of crowd counting approaches degrades sharply when the test scenarios of the models are of different types from its training scenes. In practice, the crowd scenes are highly variable, and the lack of cross-scene capability could seriously limit the application of the existing approaches. We attribute the improvement in cross-scene crowd counting capability to the necessity of accommodating large changes in the scale of individuals and the ability to suppress the interference of cluttered backgrounds. To this end, we propose a multi-scale feature fusion network (MFFNet) with cascaded supervision. The multi-scale features extracted from the crowd images are upsampled and then combined into several feature blocks, followed by convolution and deconvolution operations on the feature blocks to derive feature matrices of different resolutions. The feature matrices are fused from bottom to top. In the process of feature fusion, the crowd density maps corresponding to the feature matrices of different resolutions are predicted separately. We devise cascaded supervision to synchronously optimize the network of different resolution density map prediction during training. The cross-scene crowd counting experiments are conducted on four types of scenes: ShanghaiTech Part_A (SHT A) with high-density crowd scenes and small-scale individuals, ShanghaiTech Part_B (SHT B) with sparse crowd distribution and medium-scale individuals, UCF_CC_50 dataset with extremely dense scenes and tiny scale individuals, and UCF-QNRF dataset with extreme variations. MFFNet exhibits the strongest scene adaptability relative to the state-of-the-art approaches, with an average decrease of 17.1% and 8.4% in mean absolute error (MAE) and mean square error (mse), respectively. The contributions of different components in our method are verified in the ablation study using the devised evaluation metrics. Our implementation will be available at https://github.com/learnsharing/MFFNet.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Double multi-scale feature fusion network for crowd counting
    Liu, Qian
    Fang, Jiongtao
    Zhong, Yixiong
    Wang, Cunbao
    Qi, Youwei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (34) : 81831 - 81855
  • [2] Multi-scale dilated convolution of feature Fusion Network for Crowd counting
    Donghua Liu
    Guodong Wang
    Guangtao Zhai
    Multimedia Tools and Applications, 2022, 81 : 37939 - 37952
  • [3] Multi-scale dilated convolution of feature Fusion Network for Crowd counting
    Liu, Donghua
    Wang, Guodong
    Zhai, Guangtao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 37939 - 37952
  • [4] MGFNet: Cross-scene crowd counting via multistage gated fusion network
    Liu, Yanbo
    Hu, Yingxiang
    Cao, Guo
    Shang, Yanfeng
    NEUROCOMPUTING, 2024, 607
  • [5] MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting
    Li, Zhaoxin
    Lu, Shuhua
    Dong, Yishan
    Guo, Jingyuan
    VISUAL COMPUTER, 2023, 39 (03): : 1045 - 1056
  • [6] MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting
    Zhaoxin Li
    Shuhua Lu
    Yishan Dong
    Jingyuan Guo
    The Visual Computer, 2023, 39 : 1045 - 1056
  • [7] Deep feature network with multi-scale fusion for highly congested crowd counting
    Leilei Yan
    Li Zhang
    Xiaohan Zheng
    Fanzhang Li
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 819 - 835
  • [8] MSFFNet: multi-scale feature fusion network with semantic optimization for crowd counting
    Rohra, Avinash
    Yin, Baoqun
    Bilal, Hazrat
    Kumar, Aakash
    Ali, Munawar
    Li, Yang
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
  • [9] Deep feature network with multi-scale fusion for highly congested crowd counting
    Yan, Leilei
    Zhang, Li
    Zheng, Xiaohan
    Li, Fanzhang
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (03) : 819 - 835
  • [10] MLANet: multi-level attention network with multi-scale feature fusion for crowd counting
    Xiong, Liyan
    Zeng, Yijuan
    Huang, Xiaohui
    Li, Zhida
    Huang, Peng
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (05): : 6591 - 6608