Deep Temporal Feature Encoding for Action Recognition

被引:0
|
作者
Li, Lin [1 ,2 ,4 ]
Zhang, Zhaoxiang [1 ,2 ,3 ,4 ]
Huang, Yan [2 ,4 ]
Wang, Liang [2 ,3 ,4 ]
机构
[1] CASIA, Res Ctr Brain Inspired Intelligence, Beijing, Peoples R China
[2] CASIA, Natl Lab Pattern Recognit, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2018年
基金
中国国家自然科学基金; 北京市自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition is an important task in computer vision. Recently, deep learning methods for video action recognition have developed rapidly. A popular way to tackle this problem is known as two-stream methods which take both spatial and temporal modalities into consideration. These methods often treat sparsely-sampled frames as input and video labels as supervision. Because of such sampling strategy, they are typically limited to processing shorter sequences, which might cause the problems such as suffering from the confusion by partial observation. In this paper we propose a novel video feature representation method, called Deep Temporal Feature Encoding (DTE). It could aggregate frame-level features into a robust and global video-level representation. Firstly, we sample enough RGB frames and optical flow stacks across the whole video. Then we use a deep temporal feature encoding layer to construct a strong video feature. Lastly, end-to-end training is applied so that our video representation could be global and sequence-aware. Comprehensive experiments are conducted on two public datasets: HMDB51 and UCF101. Experimental results demonstrate that DTE achieves the competitive state-of-the-art performance on both datasets.
引用
收藏
页码:1109 / 1114
页数:6
相关论文
共 50 条
  • [1] Squeeze-and-Excitation on Spatial and Temporal Deep Feature Space for Action Recognition
    An, Gaoyun
    Zhou, Wen
    Wu, Yuxuan
    Zheng, Zhenxing
    Liu, Yongwen
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 648 - 653
  • [2] A Discriminative Deep Model With Feature Fusion and Temporal Attention for Human Action Recognition
    Yu, Jiahui
    Gao, Hongwei
    Yang, Wei
    Jiang, Yueqiu
    Chin, Weihong
    Kubota, Naoyuki
    Ju, Zhaojie
    IEEE ACCESS, 2020, 8 : 43243 - 43255
  • [3] Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition
    Papadopoulos, Konstantinos
    Ghorbel, Enjie
    Aouada, Djamila
    Ottersten, Bjoern
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 452 - 458
  • [4] Efficient feature extraction, encoding and classification for action recognition
    Kantorov, Vadim
    Laptev, Ivan
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2593 - 2600
  • [5] Spatio Temporal Feature Evaluation for Action Recognition
    Umakanthan, Sabanadesan
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    Wark, Tim
    2012 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING TECHNIQUES AND APPLICATIONS (DICTA), 2012,
  • [6] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
    Husain, Farzad
    Dellen, Babette
    Torras, Carme
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991
  • [7] Action recognition method of spatio-temporal feature fusion deep learning network
    Pei, Xiaomin
    Fan, Huijie
    Tang, Yandong
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2018, 47 (02):
  • [8] Temporal sparse feature auto-combination deep network for video action recognition
    Wang, Qicong
    Gong, Dingxi
    Qi, Man
    Shen, Yehu
    Lei, Yunqi
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [9] Data-driven spatio-temporal RGBD feature encoding for action recognition in operating rooms
    Twinanda, Andru P.
    Alkan, Emre O.
    Gangi, Afshin
    de Mathelin, Michel
    Padoy, Nicolas
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2015, 10 (06) : 737 - 747
  • [10] Data-driven spatio-temporal RGBD feature encoding for action recognition in operating rooms
    Andru P. Twinanda
    Emre O. Alkan
    Afshin Gangi
    Michel de Mathelin
    Nicolas Padoy
    International Journal of Computer Assisted Radiology and Surgery, 2015, 10 : 737 - 747