Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos

被引:33
|
作者
Duta, Ionut Cosmin [1 ]
Ionescu, Bogdan [2 ]
Aizawa, Kiyoharu [3 ]
Sebe, Nicu [1 ]
机构
[1] Univ Trento, Trento, Italy
[2] Univ Politehn Bucuresti, Bucharest, Romania
[3] Univ Tokyo, Tokyo, Japan
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
关键词
CLASSIFICATION;
D O I
10.1109/CVPR.2017.341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.
引用
收藏
页码:3205 / 3214
页数:10
相关论文
共 50 条
  • [1] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [2] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
  • [3] SKELETON ACTION RECOGNITION BASED ON SPATIO-TEMPORAL FEATURES
    Huang, Qian
    Xie, Mengting
    Li, Xing
    Wang, Shuaichen
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3284 - 3288
  • [4] Spatio-temporal Semantic Features for Human Action Recognition
    Liu, Jia
    Wang, Xiaonian
    Li, Tianyu
    Yang, Jie
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (10): : 2632 - 2649
  • [5] Human Action Recognition Based on Spatio-temporal Features
    Sawant, Nikhil
    Biswas, K. K.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 357 - 362
  • [6] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
    Escorcia, Victor
    Carlos Niebles, Juan
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
  • [7] Action Recognition in Dark Videos Using Spatio-Temporal Features and Bidirectional Encoder Representations from Transformers
    Singh H.
    Suman S.
    Subudhi B.N.
    Jakhetiya V.
    Ghosh A.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1461 - 1471
  • [8] Action Recognition Using Discriminative Spatio-Temporal Neighborhood Features
    Cheng, Shi-Lei
    Yang, Jiang-Feng
    Ma, Zheng
    Xie, Mei
    INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND INFORMATION SECURITY (CNIS 2015), 2015, : 166 - 172
  • [9] Action recognition using spatio-temporal regularity based features
    Goodhart, Taylor
    Yan, Pingkun
    Shah, Mubarak
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 745 - 748
  • [10] Accelerated Learning of Discriminative Spatio-temporal Features for Action Recognition
    Varshney, Munender
    Rameshan, Renu
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,