Cross-modal collaborative representation and multi-level supervision for crowd counting

被引:0
|
作者
Shufang Li
Zhengping Hu
Mengyao Zhao
Shuai Bi
Zhe Sun
机构
[1] Yanshan University,School of Information Science and Engineering
[2] Hebei University of Environmental Engineering,Department of Information Engineering
来源
Signal, Image and Video Processing | 2023年 / 17卷
关键词
Crowd counting; Cross-modal collaborative representation learning; Multi-level supervision;
D O I
暂无
中图分类号
学科分类号
摘要
Crowd features are often extracted from RGB images to complete the tasks of density estimation and crowd counting. However, RGB images will be affected in some particularly poor illumination, resulting in the inability to accurately identify semantic objects, and thermal images can help solve this problem. Considering the comprehensive utilization of optical and thermal imaging information, we propose a crowd counting method based on cross-modal coordinated representation and multi-level supervision. In order to capture the complementary features of different modalities, RGB and thermal images are used as specific steams of cross-modal cooperative learning. The missing specific information is compensated and the shared information is enhanced; both are through the aggregation and distribution calculation of specific steams and shared steam. Furthermore, in order to weaken the influence of the background and strengthen the identification of crowd regions, we combine the multi-scale crowd feature extraction and region recognition. Multiple output layers are added in the propagation process of multi-modal streams, so as to achieve the purpose of multi-level supervision. Moreover, we replace the baseline training loss with the Bayesian loss for monitoring the counting expectation of each annotation point. Finally, comprehensive experiments on the RGBT-CC benchmark show the effectiveness of the proposed method.
引用
收藏
页码:601 / 608
页数:7
相关论文
共 50 条
  • [1] Cross-modal collaborative representation and multi-level supervision for crowd counting
    Li, Shufang
    Hu, Zhengping
    Zhao, Mengyao
    Bi, Shuai
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (03) : 601 - 608
  • [2] Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting
    Liu, Lingbo
    Chen, Jiaqi
    Wu, Hefeng
    Li, Guanbin
    Li, Chenglong
    Lin, Liang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4821 - 4831
  • [3] A cross-modal crowd counting method combining CNN and cross-modal transformer
    Zhang, Shihui
    Wang, Wei
    Zhao, Weibo
    Wang, Lei
    Li, Qunpeng
    IMAGE AND VISION COMPUTING, 2023, 129
  • [4] Multi-Level Cross-Modal Alignment for Image Clustering
    Qiu, Liping
    Zhang, Qin
    Chen, Xiaojun
    Cai, Shaotian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14695 - 14703
  • [5] Multi-level adversarial attention cross-modal hashing
    Wang, Benhui
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [6] CCANet: A Collaborative Cross-Modal Attention Network for RGB-D Crowd Counting
    Liu, Yanbo
    Cao, Guo
    Shi, Boshan
    Hu, Yingxiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 154 - 165
  • [7] Learning the cross-modal discriminative feature representation for RGB-T crowd counting
    Li, He
    Zhang, Shihui
    Kong, Weihang
    KNOWLEDGE-BASED SYSTEMS, 2022, 257
  • [8] Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval
    Ma, Xinhong
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3101 - 3114
  • [9] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [10] MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
    Zhao, Yiming
    Lu, Haoyu
    Zhao, Shiqi
    Wu, Haoran
    Lu, Zhiwu
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6390 - 6394