Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

被引:6
|
作者
Martinez, Fabio [1 ,2 ,3 ]
Manzanera, Antoine [2 ]
Romero, Eduardo [1 ]
机构
[1] Univ Nacl Colombia, CIM LAB, Bogota, Colombia
[2] Univ Paris Saclay, Robot Vis U2IS, ENSTA ParisTech, Palaiseau, France
[3] UIS, Escuela Ingn Sistemas & Informat, Bucaramanga, Colombia
关键词
spatiotemporal phenomena; image motion analysis; image sequences; image representation; image classification; support vector machines; spatio-temporal multiscale motion descriptor; spatially-constrained decomposition; online action recognition; human activity online classification; multiscale dense optical flow; regions of interest; RoI; small overlapped subregion spatial representation; flow orientation histogram; orientation histogram temporal history; support vector machine; ViSOR dataset; short sequence global classification; average per-frame accuracy; human activity recognition; REPRESENTATION; HISTOGRAMS;
D O I
10.1049/iet-cvi.2016.0055
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study presents a spatio-temporal motion descriptor that is computed from a spatially-constrained decomposition and applied to online classification and recognition of human activities. The method starts by computing a dense optical flow without explicit spatial regularisation. Potential human actions are detected at each frame as spatially consistent moving regions of interest (RoIs). Each of these RoIs is then sequentially partitioned to obtain a spatial representation of small overlapped subregions with different sizes. Each of these region parts is characterised by a set of flow orientation histograms. A particular RoI is then described along the time by a set of recursively calculated statistics that collect information from the temporal history of orientation histograms, to form the action descriptor. At any time, the whole descriptor can be extracted and labelled by a previously trained support vector machine. The method was evaluated using three different public datasets: (i) the ViSOR dataset was used for global classification obtaining an average accuracy of 95% and for recognition in long sequences, achieving an average per-frame accuracy of 92.3%. (ii) The KTH dataset was used for global classification and (iii) the UT-datasets were used for recognition task, obtaining an average accuracy of 80% (frame rate).
引用
收藏
页码:541 / 549
页数:9
相关论文
共 11 条
  • [1] Human action recognition in immersive virtual reality based on multi-scale spatio-temporal attention network
    Xiao, Zhiyong
    Chen, Yukun
    Zhou, Xinlei
    He, Mingwei
    Liu, Li
    Yu, Feng
    Jiang, Minghua
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (05)
  • [2] A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points
    Holte, Michael B.
    Chakraborty, Bhaskar
    Gonzalez, Jordi
    Moeslund, Thomas B.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2012, 6 (05) : 553 - 565
  • [3] Learning spatio-temporal features for action recognition from the side of the video
    Lishen Pei
    Mao Ye
    Xuezhuan Zhao
    Tao Xiang
    Tao Li
    Signal, Image and Video Processing, 2016, 10 : 199 - 206
  • [4] Learning spatio-temporal features for action recognition from the side of the video
    Pei, Lishen
    Ye, Mao
    Zhao, Xuezhuan
    Xiang, Tao
    Li, Tao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 199 - 206
  • [5] Human Activity Recognition: A Spatio-temporal Image Encoding of 3D Skeleton Data for Online Action Detection
    Mokhtari, Nassim
    Nedelec, Alexis
    De Loor, Pierre
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 448 - 455
  • [6] Action recognition using global spatio-temporal features derived from sparse representations
    Somasundaram, Guruprasad
    Cherian, Anoop
    Morellas, Vassilios
    Papanikolopoulos, Nikolaos
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 123 : 1 - 13
  • [7] Multi-scale temporal feature-based dense convolutional network for action recognition
    Li, Xiaoqiang
    Xie, Miao
    Zhang, Yin
    Li, Jide
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (06)
  • [8] Online view-invariant human action recognition using rgb-d spatio-temporal matrix
    Hsu, Yen-Pin
    Liu, Chengyin
    Chen, Tzu-Yang
    Fu, Li-Chen
    PATTERN RECOGNITION, 2016, 60 : 215 - 226
  • [9] Spatio-temporal attention modules in orientation-magnitude-response guided multi-stream CNNs for human action recognition
    Khezerlou, Fatemeh
    Baradarani, Aryaz
    Balafar, Mohammad Ali
    Maev, Roman Gr.
    IET IMAGE PROCESSING, 2024, 18 (09) : 2372 - 2388
  • [10] Human action recognition based on spatio-temporal three-dimensional scattering transform descriptor and an improved VLAD feature encoding algorithm
    Lin, Bo
    Fang, Bin
    Yang, Weibin
    Qian, Jiye
    NEUROCOMPUTING, 2019, 348 : 145 - 157