Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

被引:33
|
作者
Duta, Ionut Cosmin [1 ]
Ionescu, Bogdan [2 ]
Aizawa, Kiyoharu [3 ]
Sebe, Nicu [1 ]
机构
[1] Univ Trento, Trento, Italy
[2] Univ Politehn Bucuresti, Bucharest, Romania
[3] Univ Tokyo, Tokyo, Japan
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
关键词
CLASSIFICATION;
D O I
10.1109/CVPR.2017.341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.
引用
收藏
页码:3205 / 3214
页数:10
相关论文
共 50 条
  • [41] Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
    Gkountakos, Konstantinos
    Touska, Despoina
    Ioannidis, Konstantinos
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 451 - 455
  • [42] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
    Y. Wang
    X. J. Shen
    H. P. Chen
    J. X. Sun
    Pattern Recognition and Image Analysis, 2021, 31 : 580 - 587
  • [43] A spatio-temporal recurrent network for salmon feeding action recognition from underwater videos in aquaculture
    Maloy, Hakon
    Aamodt, Agnar
    Misimi, Ekrem
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 167
  • [44] STP-Net: Spatio-Temporal Polarization Network for action recognition using polarimetric videos
    Kanth, R. Krishna
    Ramaswamy, Akshaya
    Kumar, A. Anil
    Gubbi, Jayavardhana
    Balamuralidhar, P.
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 767 - 776
  • [45] Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition
    Nazir, Saima
    Yousaf, Muhammad Haroon
    Velastin, Sergio A.
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 660 - 669
  • [46] Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers
    Golparvar-Fard, Mani
    Heydarian, Arsalan
    Carlos Niebles, Juan
    ADVANCED ENGINEERING INFORMATICS, 2013, 27 (04) : 652 - 663
  • [47] Learning Spatio-Temporal Features for Action Recognition with Modified Hidden Conditional Random Field
    Xu, Wanru
    Miao, Zhenjiang
    Zhang, Jian
    Tian, Yi
    COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 786 - 801
  • [48] Spatio-Temporal Features in Action Recognition Using 3D Skeletal Joints
    Trascau, Mihai
    Nan, Mihai
    Florea, Adina Magda
    SENSORS, 2019, 19 (02)
  • [49] Combining Handcrafted Spatio-Temporal and Deep Spatial Features for Effective Human Action Recognition
    R. Divya Rani
    C. J. Prabhakar
    Human-Centric Intelligent Systems, 2025, 5 (1): : 123 - 150
  • [50] Graph-based approach for human action recognition using spatio-temporal features
    Ben Aoun, Najib
    Mejdoub, Mahmoud
    Ben Amar, Chokri
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (02) : 329 - 338