A novel framework for crowd counting using video and audio

被引:1
作者
Zou, Yi [1 ]
Min, Weidong [1 ,2 ,3 ]
Zhao, Haoyu [1 ]
Han, Qing [1 ]
机构
[1] Nanchang Univ, Sch Math & Comp Sci, Nanchang 330031, Peoples R China
[2] Nanchang Univ, Inst Metaverse, Nanchang 330031, Peoples R China
[3] Jiangxi Key Lab Smart City, Nanchang 330031, Peoples R China
基金
中国国家自然科学基金;
关键词
Crowd counting; VACCNet; Video Crowd Counting; Multiple direction audio assistance;
D O I
10.1016/j.compeleceng.2023.108754
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Crowd counting is significant in many areas. The existing methods have poor accuracy for perspective scenes and low illumination scenes. Besides, the existing audio-assisted methods only use local audio, which fails to provide the spatial feature information of sound in all directions in space. To alleviate the above problems, a novel framework named Video and Audio-assisted Crowd Counting Network (VACCNet) is proposed. The framework consists of two submodules: Video Crowd Counting (VCC) module and Audio-assisted Crowd Counting (ACC) module. The visual features from the VCC module and the fused audio features from the ACC module are further combined to get the final density map. To prove the effects of VACCNet, a new self-collected dataset named multiPle dIrection Assistance couNting netwOrk (PIANO) is built. The experimental results based on existing benchmarks and PIANO show that the proposed method has a 14.23% improvement averagely to the conventional methods.
引用
收藏
页数:12
相关论文
共 24 条
[1]   Scale Aggregation Network for Accurate and Efficient Crowd Counting [J].
Cao, Xinkun ;
Wang, Zhipeng ;
Zhao, Yanyun ;
Su, Fei .
COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 :757-773
[2]  
Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
[3]  
Hu D, 2020, ARXIV, P07097
[4]   AVMSN: An Audio-Visual Two Stream Crowd Counting Framework Under Low-Quality Conditions [J].
Hu, Ruihan ;
Mo, Qinglong ;
Xie, Yuanfei ;
Xu, Yongqian ;
Chen, Jiaqi ;
Yang, Yalun ;
Zhou, Hongjian ;
Tang, Zhi-Ri ;
Wu, Edmond Q. .
IEEE ACCESS, 2021, 9 :80500-80510
[5]   Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds [J].
Idrees, Haroon ;
Tayyab, Muhmmad ;
Athrey, Kishan ;
Zhang, Dong ;
Al-Maadeed, Somaya ;
Rajpoot, Nasir ;
Shah, Mubarak .
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 :544-559
[6]   CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes [J].
Li, Yuhong ;
Zhang, Xiaofan ;
Chen, Deming .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1091-1100
[7]  
Lin Y.-B., 2020, P AS C COMP VIS, P274
[8]   Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching [J].
Lin, Zhe ;
Davis, Larry S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (04) :604-618
[9]   Context-Aware Crowd Counting [J].
Liu, Weizhe ;
Salzmann, Mathieu ;
Fua, Pascal .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5094-5103
[10]  
Liu Y., 2022, IEEE Trans. Pattern Anal. Mach. Intell.