Deep Temporal Feature Encoding for Action Recognition

被引:0
作者
Li, Lin [1 ,2 ,4 ]
Zhang, Zhaoxiang [1 ,2 ,3 ,4 ]
Huang, Yan [2 ,4 ]
Wang, Liang [2 ,3 ,4 ]
机构
[1] CASIA, Res Ctr Brain Inspired Intelligence, Beijing, Peoples R China
[2] CASIA, Natl Lab Pattern Recognit, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2018年
基金
中国国家自然科学基金; 国家重点研发计划; 北京市自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition is an important task in computer vision. Recently, deep learning methods for video action recognition have developed rapidly. A popular way to tackle this problem is known as two-stream methods which take both spatial and temporal modalities into consideration. These methods often treat sparsely-sampled frames as input and video labels as supervision. Because of such sampling strategy, they are typically limited to processing shorter sequences, which might cause the problems such as suffering from the confusion by partial observation. In this paper we propose a novel video feature representation method, called Deep Temporal Feature Encoding (DTE). It could aggregate frame-level features into a robust and global video-level representation. Firstly, we sample enough RGB frames and optical flow stacks across the whole video. Then we use a deep temporal feature encoding layer to construct a strong video feature. Lastly, end-to-end training is applied so that our video representation could be global and sequence-aware. Comprehensive experiments are conducted on two public datasets: HMDB51 and UCF101. Experimental results demonstrate that DTE achieves the competitive state-of-the-art performance on both datasets.
引用
收藏
页码:1109 / 1114
页数:6
相关论文
共 50 条
[31]   Encoding learning network combined with feature similarity constraints for human action recognition [J].
Wu, Chao ;
Gao, Yakun ;
Li, Guang ;
Shi, Chunfeng .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) :48631-48658
[32]   Efficient Local Feature Encoding for Human Action Recognition with Approximate Sparse Coding [J].
Wang, Yu ;
Kato, Jien .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04) :1212-1220
[33]   VLAD3: Encoding Dynamics of Deep Features for Action Recognition [J].
Li, Yingwei ;
Li, Weixin ;
Mahadevan, Vijay ;
Vasconcelos, Nuno .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1951-1960
[34]   Local Feature Fusion Temporal Convolutional Network for Human Action Recognition [J].
Song Z. ;
Zhou Y. ;
Jia J. ;
Xin S. ;
Liu Y. .
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (03) :418-424
[35]   EFFICIENT TEMPORAL-SPATIAL FEATURE GROUPING FOR VIDEO ACTION RECOGNITION [J].
Qiu, Zhikang ;
Zhao, Xu ;
Hu, Zhilan .
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, :2176-2180
[36]   Improved Spatio-temporal Salient Feature Detection for Action Recognition [J].
Shabani, Amir H. ;
Clausi, David A. ;
Zelek, John S. .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[37]   Deep-Temporal LSTM for Daily Living Action Recognition [J].
Das, Srijan ;
koperski, Michal ;
Bremond, Francois ;
Francesca, Gianpiero .
2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2018, :235-240
[38]   Deep temporal motion descriptor (DTMD) for human action recognition [J].
Nida, Nudrat ;
Yousaf, Muhammad Haroon ;
Irtaza, Aun ;
Velastin, Sergio A. .
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (03) :1371-1385
[39]   A Context Based Deep Temporal Embedding Network in Action Recognition [J].
Koohzadi, Maryam ;
Charkari, Nasrollah Moghadam .
NEURAL PROCESSING LETTERS, 2020, 52 (01) :187-220
[40]   A Context Based Deep Temporal Embedding Network in Action Recognition [J].
Maryam Koohzadi ;
Nasrollah Moghadam Charkari .
Neural Processing Letters, 2020, 52 :187-220