Spatio-Temporal Laplacian Pyramid Coding for Action Recognition

被引：191

作者：

Shao, Ling ^{[1
,2
]}

Zhen, Xiantong ^{[2
]}

Tao, Dacheng ^{[3
,4
]}

Li, Xuelong ^{[5
]}

机构：

[1] Nanjing Univ Informat Sci & Technol, Coll Elect & Informat Engn, Nanjing 210044, Jiangsu, Peoples R China

[2] Univ Sheffield, Dept Elect & Elect Engn, Sheffield S1 3JD, S Yorkshire, England

[3] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Ultimo, NSW 2007, Australia

[4] Univ Technol Sydney, Fac Engn & Informat Technol, Ultimo, NSW 2007, Australia

[5] Chinese Acad Sci, Xian Inst Opt & Precis Mech, State Key Lab Transient Opt & Photon, Ctr OPT IMagery Anal & Learning, Xian 710119, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2014年 / 44卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Action recognition; computer vision; max pooling; spatio-temporal Laplacian pyramid; FEATURES; CONTEXT; MODEL;

D O I：

10.1109/TCYB.2013.2273174

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a novel descriptor, called spatio-temporal Laplacian pyramid coding (STLPC), for holistic representation of human actions. In contrast to sparse representations based on detected local interest points, STLPC regards a video sequence as a whole with spatio-temporal features directly extracted from it, which prevents the loss of information in sparse representations. Through decomposing each sequence into a set of band-pass-filtered components, the proposed pyramid model localizes features residing at different scales, and therefore is able to effectively encode the motion information of actions. To make features further invariant and resistant to distortions as well as noise, a bank of 3-D Gabor filters is applied to each level of the Laplacian pyramid, followed by max pooling within filter bands and over spatio-temporal neighborhoods. Since the convolving and pooling are performed spatio-temporally, the coding model can capture structural and motion information simultaneously and provide an informative representation of actions. The proposed method achieves superb recognition rates on the KTH, the multiview IXMAS, the challenging UCF Sports, and the newly released HMDB51 datasets. It outperforms state of the art methods showing its great potential on action recognition.

引用

页码：817 / 827

页数：11

共 50 条

[41] Study of Human Action Recognition Based on Improved Spatio-temporal Features
Xiao-Fei Ji
Qian-Qian Wu
Zhao-Jie Ju
Yang-Yang Wang
International Journal of Automation and Computing, 2014, (05) : 500 - 509
[42] Spatio-temporal segments attention for skeleton-based action recognition
Qiu, Helei
Hou, Biao
Ren, Bo
Zhang, Xiaohua
NEUROCOMPUTING, 2023, 518 : 30 - 38
[43] Modeling spatio-temporal layout with Lie Algebrized Gaussians for action recognition
Meng Chen
Liyu Gong
Tianjiang Wang
Fang Liu
Qi Feng
Multimedia Tools and Applications, 2016, 75 : 10335 - 10355
[44] Adaptive Pooling of the Most Relevant Spatio-Temporal Features for Action Recognition
Ahmed, Faisal
Paul, Padma Polash
Gavrilova, Marina
PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 177 - 180
[45] STAN: Spatio-Temporal Analysis Network for efficient video action recognition
Chen, Shilin
Wang, Xingwang
Sun, Yafeng
Yan, Kun
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
[46] Learning spatio-temporal features for action recognition from the side of the video
Lishen Pei
Mao Ye
Xuezhuan Zhao
Tao Xiang
Tao Li
Signal, Image and Video Processing, 2016, 10 : 199 - 206
[47] SPATIO-TEMPORAL SLOWFAST SELF-ATTENTION NETWORK FOR ACTION RECOGNITION
Kim, Myeongjun
Kim, Taehun
Kim, Daijin
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2206 - 2210
[48] Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments
Liu, Dianting
Yan, Yilin
Shyu, Mei-Ling
Zhao, Guiru
Chen, Min
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2015, 6 (01) : 1 - 18
[49] Learning to Represent Spatio-Temporal Features for Fine Grained Action Recognition
Sakhalkar, Kaustubh
Bremond, Francois
2018 IEEE THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2018, : 268 - 272
[50] Spatio-temporal Multi-level Fusion for Human Action Recognition
Manh-Hung Lu
Thi-Oanh Nguyen
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305

← 1 2 3 4 5 →