Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

被引:33
|
作者
Duta, Ionut Cosmin [1 ]
Ionescu, Bogdan [2 ]
Aizawa, Kiyoharu [3 ]
Sebe, Nicu [1 ]
机构
[1] Univ Trento, Trento, Italy
[2] Univ Politehn Bucuresti, Bucharest, Romania
[3] Univ Tokyo, Tokyo, Japan
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
关键词
CLASSIFICATION;
D O I
10.1109/CVPR.2017.341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.
引用
收藏
页码:3205 / 3214
页数:10
相关论文
共 50 条
  • [21] Learning spatio-temporal features for action recognition from the side of the video
    Pei, Lishen
    Ye, Mao
    Zhao, Xuezhuan
    Xiang, Tao
    Li, Tao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 199 - 206
  • [22] A fast human action recognition network based on spatio-temporal features
    Xu, Jie
    Song, Rui
    Wei, Haoliang
    Guo, Jinhong
    Zhou, Yifei
    Huang, Xiwei
    Neurocomputing, 2021, 441 : 350 - 358
  • [23] Action Recognition via an Improved Local Descriptor for Spatio-temporal Features
    Yang, Kai
    Du, Ji-Xiang
    Zhai, Chuan-Min
    ADVANCED INTELLIGENT COMPUTING, 2011, 6838 : 234 - 241
  • [24] Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features
    Ji, Yanli
    Shimada, Atsushi
    Taniguchi, Rin-ichiro
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 391 - 398
  • [25] Learning to Represent Spatio-Temporal Features for Fine Grained Action Recognition
    Sakhalkar, Kaustubh
    Bremond, Francois
    2018 IEEE THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2018, : 268 - 272
  • [26] Learning spatio-temporal features for action recognition from the side of the video
    Lishen Pei
    Mao Ye
    Xuezhuan Zhao
    Tao Xiang
    Tao Li
    Signal, Image and Video Processing, 2016, 10 : 199 - 206
  • [27] Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
    Borzeshi, Ehsan Zare
    Concha, Oscar Perez
    Piccardi, Massimo
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 474 - 482
  • [28] Improved Spatio-temporal Action Localization for Surveillance Videos
    Liang, Morgan
    Li, Xun
    Onie, Sandersan
    Larsen, Mark
    Sowmya, Arcot
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 147 - 154
  • [29] Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
    Song, Yeongtaek
    Kim, Incheol
    SENSORS, 2019, 19 (05)
  • [30] Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness
    Yang, Xiaodong
    Tian, YingLi
    COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 : 727 - 741