Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain

被引:24
|
作者
Husain, Farzad [1 ]
Dellen, Babette [2 ]
Torras, Carme [1 ]
机构
[1] UPC, CSIC, Inst Robot & Informat Ind, Barcelona 08028, Spain
[2] Hsch Koblenz, RheinAhrCampus, D-53424 Remagen, Germany
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2016年 / 1卷 / 02期
关键词
Computer vision for automation; recognition; visual learning;
D O I
10.1109/LRA.2016.2529686
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2-D convolutional neural network extended to a concatenated 3-D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2-D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), whichmakes it more general and flexible than other approaches. Our implementation is made available.
引用
收藏
页码:984 / 991
页数:8
相关论文
共 50 条
  • [1] Recognizing Gaits on Spatio-Temporal Feature Domain
    Kusakunniran, Worapan
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2014, 9 (09) : 1416 - 1423
  • [2] Spatio-Temporal Feature Extraction/Recognition in Videos Based on Energy Optimization
    Sakaino, Hidetomo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) : 3395 - 3407
  • [3] Learning Sequence Descriptor Based on Spatio-Temporal Attention for Visual Place Recognition
    Zhao, Junqiao
    Zhang, Fenglin
    Cai, Yingfeng
    Tian, Gengxuan
    Mu, Wenjie
    Ye, Chen
    Feng, Tiantian
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03) : 2351 - 2358
  • [4] Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition
    Hong, Younggi
    Kim, Min Ju
    Lee, Isack
    Yoo, Seok Bong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6411 - 6418
  • [5] Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation
    Qiu, Zhaofan
    Yao, Ting
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (04) : 939 - 949
  • [6] Deep Learning Model for Global Spatio-Temporal Image Prediction
    Nikezic, Dusan P.
    Ramadani, Uzahir R.
    Radivojevic, Dusan S.
    Lazovic, Ivan M.
    Mirkov, Nikola S.
    MATHEMATICS, 2022, 10 (18)
  • [7] Gait feature learning via spatio-temporal two-branch networks
    Chen, Yifan
    Li, Xuelong
    PATTERN RECOGNITION, 2024, 147
  • [8] Spatio-temporal features based deep learning model for depression detection using two electrodes
    Choudhary, Shubham
    Bajpai, Manish Kumar
    Bharti, Kusum Kumari
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (08)
  • [9] Learning motion representation for real-time spatio-temporal action localization
    Zhang, Dejun
    He, Linchao
    Tu, Zhigang
    Zhang, Shifu
    Han, Fei
    Yang, Boxiong
    PATTERN RECOGNITION, 2020, 103
  • [10] HOG and HOOF Spatio-Temporal Descriptors for Gesture Recognition
    Agab, Salah Eddine
    Chelali, Fatma Zohra
    2018 INTERNATIONAL CONFERENCE ON SIGNAL, IMAGE, VISION AND THEIR APPLICATIONS (SIVA), 2018,