Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

被引：6

作者：

Martinez, Fabio ^{[1
,2
,3
]}

Manzanera, Antoine ^{[2
]}

Romero, Eduardo ^{[1
]}

机构：

[1] Univ Nacl Colombia, CIM LAB, Bogota, Colombia

[2] Univ Paris Saclay, Robot Vis U2IS, ENSTA ParisTech, Palaiseau, France

[3] UIS, Escuela Ingn Sistemas & Informat, Bucaramanga, Colombia

来源：

IET COMPUTER VISION | 2017年 / 11卷 / 07期

关键词：

spatiotemporal phenomena; image motion analysis; image sequences; image representation; image classification; support vector machines; spatio-temporal multiscale motion descriptor; spatially-constrained decomposition; online action recognition; human activity online classification; multiscale dense optical flow; regions of interest; RoI; small overlapped subregion spatial representation; flow orientation histogram; orientation histogram temporal history; support vector machine; ViSOR dataset; short sequence global classification; average per-frame accuracy; human activity recognition; REPRESENTATION; HISTOGRAMS;

D O I：

10.1049/iet-cvi.2016.0055

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study presents a spatio-temporal motion descriptor that is computed from a spatially-constrained decomposition and applied to online classification and recognition of human activities. The method starts by computing a dense optical flow without explicit spatial regularisation. Potential human actions are detected at each frame as spatially consistent moving regions of interest (RoIs). Each of these RoIs is then sequentially partitioned to obtain a spatial representation of small overlapped subregions with different sizes. Each of these region parts is characterised by a set of flow orientation histograms. A particular RoI is then described along the time by a set of recursively calculated statistics that collect information from the temporal history of orientation histograms, to form the action descriptor. At any time, the whole descriptor can be extracted and labelled by a previously trained support vector machine. The method was evaluated using three different public datasets: (i) the ViSOR dataset was used for global classification obtaining an average accuracy of 95% and for recognition in long sequences, achieving an average per-frame accuracy of 92.3%. (ii) The KTH dataset was used for global classification and (iii) the UT-datasets were used for recognition task, obtaining an average accuracy of 80% (frame rate).

引用

页码：541 / 549

页数：9

共 11 条

[1] Human action recognition in immersive virtual reality based on multi-scale spatio-temporal attention network
Xiao, Zhiyong
Chen, Yukun
Zhou, Xinlei
He, Mingwei
Liu, Li
Yu, Feng
Jiang, Minghua
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (05)
[2] A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points
Holte, Michael B.
Chakraborty, Bhaskar
Gonzalez, Jordi
Moeslund, Thomas B.
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2012, 6 (05) : 553 - 565
[3] Learning spatio-temporal features for action recognition from the side of the video
Lishen Pei
Mao Ye
Xuezhuan Zhao
Tao Xiang
Tao Li
Signal, Image and Video Processing, 2016, 10 : 199 - 206
[4] Learning spatio-temporal features for action recognition from the side of the video
Pei, Lishen
Ye, Mao
Zhao, Xuezhuan
Xiang, Tao
Li, Tao
SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 199 - 206
[5] Human Activity Recognition: A Spatio-temporal Image Encoding of 3D Skeleton Data for Online Action Detection
Mokhtari, Nassim
Nedelec, Alexis
De Loor, Pierre
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 448 - 455
[6] Action recognition using global spatio-temporal features derived from sparse representations
Somasundaram, Guruprasad
Cherian, Anoop
Morellas, Vassilios
Papanikolopoulos, Nikolaos
COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 123 : 1 - 13
[7] Multi-scale temporal feature-based dense convolutional network for action recognition
Li, Xiaoqiang
Xie, Miao
Zhang, Yin
Li, Jide
JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (06)
[8] Online view-invariant human action recognition using rgb-d spatio-temporal matrix
Hsu, Yen-Pin
Liu, Chengyin
Chen, Tzu-Yang
Fu, Li-Chen
PATTERN RECOGNITION, 2016, 60 : 215 - 226
[9] Spatio-temporal attention modules in orientation-magnitude-response guided multi-stream CNNs for human action recognition
Khezerlou, Fatemeh
Baradarani, Aryaz
Balafar, Mohammad Ali
Maev, Roman Gr.
IET IMAGE PROCESSING, 2024, 18 (09) : 2372 - 2388
[10] Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm
Lin, Bo
Fang, Bin
Yang, Weibin
Qian, Jiye
NEUROCOMPUTING, 2019, 348 : 145 - 157

← 1 2 →