Cross-modal collaborative representation and multi-level supervision for crowd counting

被引：0

作者：

Shufang Li

Zhengping Hu

Mengyao Zhao

Shuai Bi

Zhe Sun

机构：

[1] Yanshan University,School of Information Science and Engineering

[2] Hebei University of Environmental Engineering,Department of Information Engineering

来源：

Signal, Image and Video Processing | 2023年 / 17卷

关键词：

Crowd counting; Cross-modal collaborative representation learning; Multi-level supervision;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Crowd features are often extracted from RGB images to complete the tasks of density estimation and crowd counting. However, RGB images will be affected in some particularly poor illumination, resulting in the inability to accurately identify semantic objects, and thermal images can help solve this problem. Considering the comprehensive utilization of optical and thermal imaging information, we propose a crowd counting method based on cross-modal coordinated representation and multi-level supervision. In order to capture the complementary features of different modalities, RGB and thermal images are used as specific steams of cross-modal cooperative learning. The missing specific information is compensated and the shared information is enhanced; both are through the aggregation and distribution calculation of specific steams and shared steam. Furthermore, in order to weaken the influence of the background and strengthen the identification of crowd regions, we combine the multi-scale crowd feature extraction and region recognition. Multiple output layers are added in the propagation process of multi-modal streams, so as to achieve the purpose of multi-level supervision. Moreover, we replace the baseline training loss with the Bayesian loss for monitoring the counting expectation of each annotation point. Finally, comprehensive experiments on the RGBT-CC benchmark show the effectiveness of the proposed method.

引用

页码：601 / 608

页数：7

共 50 条

[1] Cross-modal collaborative representation and multi-level supervision for crowd counting
Li, Shufang
Hu, Zhengping
Zhao, Mengyao
Bi, Shuai
Sun, Zhe
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (03) : 601 - 608
[2] Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting
Liu, Lingbo
Chen, Jiaqi
Wu, Hefeng
Li, Guanbin
Li, Chenglong
Lin, Liang
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4821 - 4831
[3] A cross-modal crowd counting method combining CNN and cross-modal transformer
Zhang, Shihui
Wang, Wei
Zhao, Weibo
Wang, Lei
Li, Qunpeng
IMAGE AND VISION COMPUTING, 2023, 129
[4] Multi-Level Cross-Modal Alignment for Image Clustering
Qiu, Liping
Zhang, Qin
Chen, Xiaojun
Cai, Shaotian
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14695 - 14703
[5] Multi-level adversarial attention cross-modal hashing
Wang, Benhui
Zhang, Huaxiang
Zhu, Lei
Nie, Liqiang
Liu, Li
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
[6] CCANet: A Collaborative Cross-Modal Attention Network for RGB-D Crowd Counting
Liu, Yanbo
Cao, Guo
Shi, Boshan
Hu, Yingxiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 154 - 165
[7] Learning the cross-modal discriminative feature representation for RGB-T crowd counting
Li, He
Zhang, Shihui
Kong, Weihang
KNOWLEDGE-BASED SYSTEMS, 2022, 257
[8] Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval
Ma, Xinhong
Zhang, Tianzhu
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3101 - 3114
[9] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
Ji, Zhenyan
Yao, Weina
Wei, Wei
Song, Houbing
Pi, Huaiyu
IEEE ACCESS, 2019, 7 : 23667 - 23674
[10] MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
Zhao, Yiming
Lu, Haoyu
Zhao, Shiqi
Wu, Haoran
Lu, Zhiwu
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6390 - 6394

← 1 2 3 4 5 →