Deep Temporal Feature Encoding for Action Recognition

被引：0

作者：

Li, Lin ^{[1
,2
,4
]}

Zhang, Zhaoxiang ^{[1
,2
,3
,4
]}

Huang, Yan ^{[2
,4
]}

Wang, Liang ^{[2
,3
,4
]}

机构：

[1] CASIA, Res Ctr Brain Inspired Intelligence, Beijing, Peoples R China

[2] CASIA, Natl Lab Pattern Recognit, Beijing, Peoples R China

[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China

[4] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2018年

基金：

中国国家自然科学基金; 国家重点研发计划; 北京市自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action recognition is an important task in computer vision. Recently, deep learning methods for video action recognition have developed rapidly. A popular way to tackle this problem is known as two-stream methods which take both spatial and temporal modalities into consideration. These methods often treat sparsely-sampled frames as input and video labels as supervision. Because of such sampling strategy, they are typically limited to processing shorter sequences, which might cause the problems such as suffering from the confusion by partial observation. In this paper we propose a novel video feature representation method, called Deep Temporal Feature Encoding (DTE). It could aggregate frame-level features into a robust and global video-level representation. Firstly, we sample enough RGB frames and optical flow stacks across the whole video. Then we use a deep temporal feature encoding layer to construct a strong video feature. Lastly, end-to-end training is applied so that our video representation could be global and sequence-aware. Comprehensive experiments are conducted on two public datasets: HMDB51 and UCF101. Experimental results demonstrate that DTE achieves the competitive state-of-the-art performance on both datasets.

引用

页码：1109 / 1114

页数：6

共 50 条

[41] Learning Generalized Feature for Temporal Action Detection: Application for Natural Driving Action Recognition Challenge [J].

Chuong Nguyen ;

Ngoc Nguyen ;

Su Huynh ;

Vinh Nguyen ;

Son Nguyen .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :3248-3255

[42] Auxiliary criterion conversion via spatiotemporal semantic encoding and feature entropy for action recognition [J].

Xiaoyan Meng ;

Guoliang Zhang ;

Songmin Jia ;

Xiuzhi Li ;

Xiangyin Zhang .

The Visual Computer, 2021, 37 :1673-1690

[43] Auxiliary criterion conversion via spatiotemporal semantic encoding and feature entropy for action recognition [J].

Meng, Xiaoyan ;

Zhang, Guoliang ;

Jia, Songmin ;

Li, Xiuzhi ;

Zhang, Xiangyin .

VISUAL COMPUTER, 2021, 37 (07) :1673-1690

[44] Feature Retrieving for Human Action Recognition by Mixed Scale Deep Feature Combined with Attention Model [J].

Zhao, Xiaolei ;

Yi, Yang ;

Qiu, Zemin ;

Zeng, Qingqing .

2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2020), 2020, :235-239

[45] Feature Aggregation Tree: Capture Temporal Motion Information for Action Recognition in Videos [J].

Zhu, Bing .

PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 :316-327

[46] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module [J].

Gong, Suming ;

Chen, Ying .

2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, :338-341

[47] Dynamic Temporal Shift Feature Enhancement for Few-Shot Action Recognition [J].

Li, Haibo ;

Zhang, Bingbing ;

Ma, Yuanchen ;

Guo, Qiang ;

Zhang, Jianxin ;

Zhang, Qiang .

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 :471-484

[48] Combination of temporal-channels correlation information and bilinear feature for action recognition [J].

Cai, Jiahui ;

Hu, Jianguo ;

Li, Shiren ;

Lin, Jialing ;

Wang, Jun .

IET COMPUTER VISION, 2020, 14 (08) :634-641

[49] Actor-Centric Spatio-Temporal Feature Extraction for Action Recognition [J].

Anil, Kunchala ;

Bouroche, Melanie ;

Schoen-Phelan, Bianca .

COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 :586-599

[50] Action Recognition with Temporal Scale-Invariant Deep Learning Framework [J].

Chen, Huafeng ;

Chen, Jun ;

Hu, Ruimin ;

Chen, Chen ;

Wang, Zhongyuan .

CHINA COMMUNICATIONS, 2017, 14 (02) :163-172

← 1 2 3 4 5 →