An Effective Lightweight Crowd Counting Method Based on an Encoder-Decoder Network for Internet of Video Things

被引:12
作者
Yi, Jun [1 ]
Chen, Fan [1 ]
Shen, Zhilong [2 ]
Xiang, Yi [1 ]
Xiao, Shan [3 ]
Zhou, Wei [1 ]
机构
[1] Chongqing Univ Sci & Technol, Coll Intelligent Technol & Engn, Chongqing 401331, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[3] Chongqing Coll Elect Engn, Inst Big Data & Optimizat, Chongqing 401331, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution neural network; crowd counting; edge computing; lightweight network;
D O I
10.1109/JIOT.2023.3294727
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An emerging Internet of Video Things (IoVT) application, crowd counting is a computer vision task where the number of heads in a crowded scene is estimated. In recent years, it has attracted increasing attention from academia and industry because of its great potential value in public safety and urban planning. However, it has become a challenge to cross the gap between the increasingly heavy and complex network architecture widely used for the pursuit of counting with high accuracy and the constrained computing and storage resources in the edge computing environment. To address this issue, an effective lightweight crowd counting method based on an encoder-decoder network, named lightweight crowd counting network (LEDCrowdNet), is proposed to achieve an optimal tradeoff between counting performance and running speed for edge applications of IoVT. In particular, an improved MobileViT module as an encoder is designed to extract global-local crowd features of various scales. The decoder is composed of the adaptive multiscale large kernel attention module (AMLKA) and the lightweight counting atrous spatial pyramid pooling process module (LC-ASPP), which can perform end-to-end training to obtain the final density map. The proposed LEDCrowdNet is suitable for deployment on two edge computing platforms (NVIDIA Jetson Xavier NX and Coral Edge TPU) to reduce the number of floating point operations (FLOPs) without a significant drop in accuracy. Extensive experiments on five mainstream benchmarks (ShanghaiTech Part_A/B, UCF_CC_50, UCF-QNRF, WorldExpo'10, and RSOC data sets) verify the correctness and efficiency of our method.
引用
收藏
页码:3082 / 3094
页数:13
相关论文
共 70 条
[1]  
[Anonymous], 2020, Coral
[2]  
[Anonymous], 2019, Nvidia
[3]   Scale Aggregation Network for Accurate and Efficient Crowd Counting [J].
Cao, Xinkun ;
Wang, Zhipeng ;
Zhao, Yanyun ;
Su, Fei .
COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 :757-773
[4]   Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting [J].
Chen, Binghui ;
Yan, Zhaoyi ;
Li, Ke ;
Li, Pengyu ;
Wang, Biao ;
Zuo, Wangmeng ;
Zhang, Lei .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16045-16055
[5]   Internet of Video Things: Next-Generation IoT With Visual Sensors [J].
Chen, Chang Wen .
IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (08) :6676-6685
[6]   Flounder-Net: An efficient CNN for crowd counting by aerial photography [J].
Chen, Jingyu ;
Xiu, Shengjie ;
Chen, Xiang ;
Guo, Hao ;
Xie, Xiaohua .
NEUROCOMPUTING, 2021, 420 :82-89
[7]   Scale Pyramid Network for Crowd Counting [J].
Chen, Xinya ;
Bin, Yanrui ;
Sang, Nong ;
Gao, Changxin .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1941-1950
[8]   Joint FrontEdgeCloud IoVT Analytics: Resource-Effective Design and Scheduling [J].
Chen, Youjia ;
Zhao, Tiesong ;
Cheng, Peng ;
Ding, Ming ;
Chen, Chang Wen .
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (23) :23941-23953
[9]  
Chenyu Gao, 2019, Pattern Recognition and Computer Vision. Second Chinese Conference, PRCV 2019. Proceedings. Lecture Notes in Computer Science (LNCS 11858), P582, DOI 10.1007/978-3-030-31723-2_50
[10]   Fast and Practical Neural Architecture Search [J].
Cui, Jiequan ;
Chen, Pengguang ;
Li, Ruiyu ;
Liu, Shu ;
Shen, Xiaoyong ;
Jia, Jiaya .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6211-6220