Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

被引:8
作者
Kim, Jinsoo [1 ]
Cho, Jeongho [1 ]
机构
[1] Soonchunhyang Univ, Dept Elect Engn, Asan 31538, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 11期
基金
新加坡国家研究基金会;
关键词
CNN; human action recognition; spatiotemporal feature; embedded system; real-time; VIDEO SURVEILLANCE;
D O I
10.3390/app11114940
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.
引用
收藏
页数:15
相关论文
共 41 条
  • [41] Real-time human-computer interface based on eye gaze estimation from low-quality webcam images: integration of convolutional neural networks, calibration, and transfer learning
    Chhimpa, Govind R.
    Kumar, Ajay
    Garhwal, Sunita
    Kumar, Dhiraj
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2025, : 64 - 74