Spatio-temporal Channel Correlation Networks for Action Classification

被引:128
|
作者
Diba, Ali [1 ,4 ]
Fayyaz, Mohsen [2 ]
Sharma, Vivek [3 ]
Arzani, M. Mahdi [4 ]
Yousefzadeh, Rahman [4 ]
Gall, Juergen [2 ]
Van Gool, Luc [1 ,4 ]
机构
[1] Katholieke Univ Leuven, ESAT PSI, Leuven, Belgium
[2] Univ Bonn, Bonn, Germany
[3] KIT, CV HCI, Karlsruhe, Germany
[4] Sensifai, Brussels, Belgium
来源
COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷
基金
欧洲研究理事会;
关键词
RECOGNITION; HISTOGRAMS;
D O I
10.1007/978-3-030-01225-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.
引用
收藏
页码:299 / 315
页数:17
相关论文
共 50 条
  • [1] Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification
    Hu, Weiming
    Liu, Haowei
    Du, Yang
    Yuan, Chunfeng
    Li, Bing
    Maybank, Stephen John
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7010 - 7028
  • [2] SPATIO-TEMPORAL CO-OCCURRENCE CHARACTERIZATIONS FOR HUMAN ACTION CLASSIFICATION
    Sabri, Aznul Qalid Md
    Boonaert, Jacques
    Abdullah, Erma Rahayu Mohd Faizal
    Mansoor, Ali Mohammed
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2017, 30 (03) : 154 - 173
  • [3] Spatio-temporal classification for polyp diagnosis
    Puyal, Juana Gonzalez-Bueno
    Brandao, Patrick
    Ahmad, Omer F.
    Bhatia, Kanwal K.
    Toth, Daniel
    Kader, Rawen
    Lovat, Laurence
    Mountney, Peter
    Stoyanov, Danail
    BIOMEDICAL OPTICS EXPRESS, 2023, 14 (02) : 593 - 607
  • [4] Scalable Spatio-temporal Analysis on Distributed Camera Networks
    Hong, Kirak
    Ottenwaelder, Beate
    Ramachandran, Umakishore
    INTELLIGENT DISTRIBUTED COMPUTING VII, 2014, 511 : 131 - 140
  • [5] Human Action Recognition Based on a Spatio-Temporal Video Autoencoder
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (11)
  • [6] Com-STAL: Compositional Spatio-Temporal Action Localization
    Wang, Shaomeng
    Yan, Rui
    Huang, Peng
    Dai, Guangzhao
    Song, Yan
    Shu, Xiangbo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7645 - 7657
  • [7] Quantification and classification of locomotion patterns by spatio-temporal morphable models
    Giese, MA
    Poggio, T
    THIRD IEEE INTERNATIONAL WORKSHOP ON VISUAL SURVEILLANCE, PROCEEDINGS, 2000, : 27 - 34
  • [8] A Scale and Translation Invariant Approach for Early Classification of Spatio-Temporal Patterns Using Spiking Neural Networks
    Rekabdar, Banafsheh
    Nicolescu, Monica
    Nicolescu, Mircea
    Saffar, Mohammad Taghi
    Kelley, Richard
    NEURAL PROCESSING LETTERS, 2016, 43 (02) : 327 - 343
  • [9] Fall detection method based on Spatio-temporal feature fusion using combined two-channel classification
    De, Anurag
    Saha, Ashim
    Kumar, Praveen
    Pal, Gautam
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 26081 - 26100
  • [10] Capturing Temporal Structures for Video Captioning by Spatio-temporal Contexts and Channel Attention Mechanism
    Guo, Dashan
    Li, Wei
    Fang, Xiangzhong
    NEURAL PROCESSING LETTERS, 2017, 46 (01) : 313 - 328