Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain

被引:24
|
作者
Husain, Farzad [1 ]
Dellen, Babette [2 ]
Torras, Carme [1 ]
机构
[1] UPC, CSIC, Inst Robot & Informat Ind, Barcelona 08028, Spain
[2] Hsch Koblenz, RheinAhrCampus, D-53424 Remagen, Germany
来源
关键词
Computer vision for automation; recognition; visual learning;
D O I
10.1109/LRA.2016.2529686
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2-D convolutional neural network extended to a concatenated 3-D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2-D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), whichmakes it more general and flexible than other approaches. Our implementation is made available.
引用
收藏
页码:984 / 991
页数:8
相关论文
共 50 条
  • [1] Action recognition method of spatio-temporal feature fusion deep learning network
    Pei, Xiaomin
    Fan, Huijie
    Tang, Yandong
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2018, 47 (02):
  • [2] Deep video action clustering via spatio-temporal feature learning
    Peng, Bo
    Lei, Jianjun
    Fu, Huazhu
    Jia, Yalong
    Zhang, Zongqian
    Li, Yi
    NEUROCOMPUTING, 2021, 456 : 519 - 527
  • [3] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [4] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module
    Gong, Suming
    Chen, Ying
    2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 338 - 341
  • [5] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
    Cai, Qiao
    Yin, Yafeng
    Man, Hong
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
  • [6] Spatio-Temporal Feature Extraction and Distance Metric Learning for Unconstrained Action Recognition
    Yoon, Yongsang
    Yu, Jongmin
    Jeon, Moongu
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [7] EFFICIENT FITNESS ACTION ANALYSIS BASED ON SPATIO-TEMPORAL FEATURE ENCODING
    Li, Jianwei
    Cui, Hainan
    Guo, Tianxiao
    Hu, Qingrui
    Shen, Yanfei
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
  • [8] Spatio-temporal Contrastive Domain Adaptation for Action Recognition
    Song, Xiaolin
    Zhao, Sicheng
    Yang, Jingyu
    Yue, Huanjing
    Xu, Pengfei
    Hu, Runbo
    Chai, Hua
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9782 - 9790
  • [9] A Spatio-Temporal Deep Learning Approach For Human Action Recognition in Infrared Videos
    Shah, Anuj K.
    Ghosh, Ripul
    Akula, Aparna
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XII, 2018, 10751
  • [10] Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks
    Wang, Lei
    Xu, Yangyang
    Cheng, Jun
    Xia, Haiying
    Yin, Jianqin
    Wu, Jiaji
    IEEE ACCESS, 2018, 6 : 17913 - 17922