Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

被引：33

作者：

Duta, Ionut Cosmin ^{[1
]}

Ionescu, Bogdan ^{[2
]}

Aizawa, Kiyoharu ^{[3
]}

Sebe, Nicu ^{[1
]}

机构：

[1] Univ Trento, Trento, Italy

[2] Univ Politehn Bucuresti, Bucharest, Romania

[3] Univ Tokyo, Tokyo, Japan

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

关键词：

CLASSIFICATION;

D O I：

10.1109/CVPR.2017.341

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.

引用

页码：3205 / 3214

页数：10

共 50 条

[21] Learning spatio-temporal features for action recognition from the side of the video
Pei, Lishen
Ye, Mao
Zhao, Xuezhuan
Xiang, Tao
Li, Tao
SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 199 - 206
[22] A fast human action recognition network based on spatio-temporal features
Xu, Jie
Song, Rui
Wei, Haoliang
Guo, Jinhong
Zhou, Yifei
Huang, Xiwei
Neurocomputing, 2021, 441 : 350 - 358
[23] Action Recognition via an Improved Local Descriptor for Spatio-temporal Features
Yang, Kai
Du, Ji-Xiang
Zhai, Chuan-Min
ADVANCED INTELLIGENT COMPUTING, 2011, 6838 : 234 - 241
[24] Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features
Ji, Yanli
Shimada, Atsushi
Taniguchi, Rin-ichiro
NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 391 - 398
[25] Learning to Represent Spatio-Temporal Features for Fine Grained Action Recognition
Sakhalkar, Kaustubh
Bremond, Francois
2018 IEEE THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2018, : 268 - 272
[26] Learning spatio-temporal features for action recognition from the side of the video
Lishen Pei
Mao Ye
Xuezhuan Zhao
Tao Xiang
Tao Li
Signal, Image and Video Processing, 2016, 10 : 199 - 206
[27] Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
Borzeshi, Ehsan Zare
Concha, Oscar Perez
Piccardi, Massimo
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 474 - 482
[28] Improved Spatio-temporal Action Localization for Surveillance Videos
Liang, Morgan
Li, Xun
Onie, Sandersan
Larsen, Mark
Sowmya, Arcot
2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 147 - 154
[29] Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
Song, Yeongtaek
Kim, Incheol
SENSORS, 2019, 19 (05)
[30] Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness
Yang, Xiaodong
Tian, YingLi
COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 : 727 - 741

← 1 2 3 4 5 →