Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

被引：33

作者：

Duta, Ionut Cosmin ^{[1
]}

Ionescu, Bogdan ^{[2
]}

Aizawa, Kiyoharu ^{[3
]}

Sebe, Nicu ^{[1
]}

机构：

[1] Univ Trento, Trento, Italy

[2] Univ Politehn Bucuresti, Bucharest, Romania

[3] Univ Tokyo, Tokyo, Japan

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

关键词：

CLASSIFICATION;

D O I：

10.1109/CVPR.2017.341

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.

引用

页码：3205 / 3214

页数：10

共 50 条

[41] Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
Gkountakos, Konstantinos
Touska, Despoina
Ioannidis, Konstantinos
Tsikrika, Theodora
Vrochidis, Stefanos
Kompatsiaris, Ioannis
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 451 - 455
[42] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
Y. Wang
X. J. Shen
H. P. Chen
J. X. Sun
Pattern Recognition and Image Analysis, 2021, 31 : 580 - 587
[43] A spatio-temporal recurrent network for salmon feeding action recognition from underwater videos in aquaculture
Maloy, Hakon
Aamodt, Agnar
Misimi, Ekrem
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 167
[44] STP-Net: Spatio-Temporal Polarization Network for action recognition using polarimetric videos
Kanth, R. Krishna
Ramaswamy, Akshaya
Kumar, A. Anil
Gubbi, Jayavardhana
Balamuralidhar, P.
2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 767 - 776
[45] Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition
Nazir, Saima
Yousaf, Muhammad Haroon
Velastin, Sergio A.
COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 660 - 669
[46] Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers
Golparvar-Fard, Mani
Heydarian, Arsalan
Carlos Niebles, Juan
ADVANCED ENGINEERING INFORMATICS, 2013, 27 (04) : 652 - 663
[47] Learning Spatio-Temporal Features for Action Recognition with Modified Hidden Conditional Random Field
Xu, Wanru
Miao, Zhenjiang
Zhang, Jian
Tian, Yi
COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 786 - 801
[48] Spatio-Temporal Features in Action Recognition Using 3D Skeletal Joints
Trascau, Mihai
Nan, Mihai
Florea, Adina Magda
SENSORS, 2019, 19 (02)
[49] Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
R. Divya Rani
C. J. Prabhakar
Human-Centric Intelligent Systems, 2025, 5 (1): : 123 - 150
[50] Graph-based approach for human action recognition using spatio-temporal features
Ben Aoun, Najib
Mejdoub, Mahmoud
Ben Amar, Chokri
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (02) : 329 - 338

← 1 2 3 4 5 →