Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

被引：33

作者：

Duta, Ionut Cosmin ^{[1
]}

Ionescu, Bogdan ^{[2
]}

Aizawa, Kiyoharu ^{[3
]}

Sebe, Nicu ^{[1
]}

机构：

[1] Univ Trento, Trento, Italy

[2] Univ Politehn Bucuresti, Bucharest, Romania

[3] Univ Tokyo, Tokyo, Japan

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

关键词：

CLASSIFICATION;

D O I：

10.1109/CVPR.2017.341

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.

引用

页码：3205 / 3214

页数：10

共 50 条

[1] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
Duta, Ionut C.
Ionescu, Bogdan
Aizawa, Kiyoharu
Sebe, Nicu
MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
[2] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
Li, Dong
Yao, Ting
Duan, Ling-Yu
Mei, Tao
Rui, Yong
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
[3] SKELETON ACTION RECOGNITION BASED ON SPATIO-TEMPORAL FEATURES
Huang, Qian
Xie, Mengting
Li, Xing
Wang, Shuaichen
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3284 - 3288
[4] Spatio-temporal Semantic Features for Human Action Recognition
Liu, Jia
Wang, Xiaonian
Li, Tianyu
Yang, Jie
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (10): : 2632 - 2649
[5] Human Action Recognition Based on Spatio-temporal Features
Sawant, Nikhil
Biswas, K. K.
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 357 - 362
[6] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
Escorcia, Victor
Carlos Niebles, Juan
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
[7] Action Recognition in Dark Videos Using Spatio-Temporal Features and Bidirectional Encoder Representations from Transformers
Singh H.
Suman S.
Subudhi B.N.
Jakhetiya V.
Ghosh A.
IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1461 - 1471
[8] Action Recognition Using Discriminative Spatio-Temporal Neighborhood Features
Cheng, Shi-Lei
Yang, Jiang-Feng
Ma, Zheng
Xie, Mei
INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND INFORMATION SECURITY (CNIS 2015), 2015, : 166 - 172
[9] Action recognition using spatio-temporal regularity based features
Goodhart, Taylor
Yan, Pingkun
Shah, Mubarak
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 745 - 748
[10] Accelerated Learning of Discriminative Spatio-temporal Features for Action Recognition
Varshney, Munender
Rameshan, Renu
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,

← 1 2 3 4 5 →