Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

被引：8

作者：

Kim, Jinsoo ^{[1
]}

Cho, Jeongho ^{[1
]}

机构：

[1] Soonchunhyang Univ, Dept Elect Engn, Asan 31538, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 11期

基金：

新加坡国家研究基金会;

关键词：

CNN; human action recognition; spatiotemporal feature; embedded system; real-time; VIDEO SURVEILLANCE;

D O I：

10.3390/app11114940

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

引用

页数：15

共 41 条

[41] Real-time human-computer interface based on eye gaze estimation from low-quality webcam images: integration of convolutional neural networks, calibration, and transfer learning
Chhimpa, Govind R.
Kumar, Ajay
Garhwal, Sunita
Kumar, Dhiraj
DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2025, : 64 - 74

← 1 2 3 4 5 →