Deep Temporal Feature Encoding for Action Recognition

被引:0
作者
Li, Lin [1 ,2 ,4 ]
Zhang, Zhaoxiang [1 ,2 ,3 ,4 ]
Huang, Yan [2 ,4 ]
Wang, Liang [2 ,3 ,4 ]
机构
[1] CASIA, Res Ctr Brain Inspired Intelligence, Beijing, Peoples R China
[2] CASIA, Natl Lab Pattern Recognit, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2018年
基金
中国国家自然科学基金; 北京市自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition is an important task in computer vision. Recently, deep learning methods for video action recognition have developed rapidly. A popular way to tackle this problem is known as two-stream methods which take both spatial and temporal modalities into consideration. These methods often treat sparsely-sampled frames as input and video labels as supervision. Because of such sampling strategy, they are typically limited to processing shorter sequences, which might cause the problems such as suffering from the confusion by partial observation. In this paper we propose a novel video feature representation method, called Deep Temporal Feature Encoding (DTE). It could aggregate frame-level features into a robust and global video-level representation. Firstly, we sample enough RGB frames and optical flow stacks across the whole video. Then we use a deep temporal feature encoding layer to construct a strong video feature. Lastly, end-to-end training is applied so that our video representation could be global and sequence-aware. Comprehensive experiments are conducted on two public datasets: HMDB51 and UCF101. Experimental results demonstrate that DTE achieves the competitive state-of-the-art performance on both datasets.
引用
收藏
页码:1109 / 1114
页数:6
相关论文
共 50 条
  • [21] Action Recognition by Joint Spatial-Temporal Motion Feature
    Zhang, Weihua
    Zhang, Yi
    Gao, Chaobang
    Zhou, Jiliu
    JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [22] Temporal Feature Weighting for Prototype-Based Action Recognition
    Mauthner, Thomas
    Roth, Peter M.
    Bischof, Horst
    COMPUTER VISION - ACCV 2010, PT II, 2011, 6493 : 566 - 579
  • [23] Temporal Segment Networks Based on Feature Propagation for Action Recognition
    Shi Y.
    Zeng Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 582 - 589
  • [24] Video Based Action Recognition using Spatial and Temporal Feature
    Dai, Cheng
    Liu, Xingang
    Zhong, Luhao
    Yu, Tao
    IEEE 2018 INTERNATIONAL CONGRESS ON CYBERMATICS / 2018 IEEE CONFERENCES ON INTERNET OF THINGS, GREEN COMPUTING AND COMMUNICATIONS, CYBER, PHYSICAL AND SOCIAL COMPUTING, SMART DATA, BLOCKCHAIN, COMPUTER AND INFORMATION TECHNOLOGY, 2018, : 635 - 638
  • [25] Encoding Spatio-temporal Distribution by Generalized VLAD for Action Recognition
    Sheng, Biyun
    Yan, Yan
    Sun, Changyin
    2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 620 - 625
  • [26] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [27] TEMPORAL ACTION PROPOSAL GENERATION VIA DEEP FEATURE ENHANCEMENT
    Hsieh, He-Yen
    Chen, Ding-Jie
    Liu, Tyng-Luh
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1391 - 1395
  • [28] EFFICIENT FITNESS ACTION ANALYSIS BASED ON SPATIO-TEMPORAL FEATURE ENCODING
    Li, Jianwei
    Cui, Hainan
    Guo, Tianxiao
    Hu, Qingrui
    Shen, Yanfei
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
  • [29] A Real-Time Action Representation With Temporal Encoding and Deep Compression
    Liu, Kun
    Liu, Wu
    Ma, Huadong
    Tan, Mingkui
    Gan, Chuang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (02) : 647 - 660
  • [30] Encoding learning network combined with feature similarity constraints for human action recognition
    Chao Wu
    Yakun Gao
    Guang Li
    Chunfeng Shi
    Multimedia Tools and Applications, 2024, 83 : 48631 - 48658