Spatio-temporal Channel Correlation Networks for Action Classification

被引:128
|
作者
Diba, Ali [1 ,4 ]
Fayyaz, Mohsen [2 ]
Sharma, Vivek [3 ]
Arzani, M. Mahdi [4 ]
Yousefzadeh, Rahman [4 ]
Gall, Juergen [2 ]
Van Gool, Luc [1 ,4 ]
机构
[1] Katholieke Univ Leuven, ESAT PSI, Leuven, Belgium
[2] Univ Bonn, Bonn, Germany
[3] KIT, CV HCI, Karlsruhe, Germany
[4] Sensifai, Brussels, Belgium
来源
COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷
基金
欧洲研究理事会;
关键词
RECOGNITION; HISTOGRAMS;
D O I
10.1007/978-3-030-01225-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.
引用
收藏
页码:299 / 315
页数:17
相关论文
共 50 条
  • [21] SURF-based Spatio-Temporal History Image Method for Action Representation
    Ahad, Md. Atiqur Rahman
    Tan, J. K.
    Kim, H.
    Ishikawa, S.
    2011 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2011,
  • [22] People and Mobile Robot Classification Through Spatio-Temporal Analysis of Optical Flow
    Moreno, Plinio
    Figueira, Dario
    Bernardino, Alexandre
    Santos-Victor, Jose
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (06)
  • [23] Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
    Wang, Le
    Duan, Xuhuan
    Zhang, Qilin
    Niu, Zhenxing
    Hua, Gang
    Zheng, Nanning
    SENSORS, 2018, 18 (05)
  • [24] Understanding dance semantics using spatio-temporal features coupled GRU networks
    Shailesh, S.
    Judy, M. V.
    ENTERTAINMENT COMPUTING, 2022, 42
  • [25] A Distributed Framework for Spatio-temporal Analysis on Large-scale Camera Networks
    Hong, Kirak
    Voelz, Marco
    Govindaraju, Venu
    Jayaraman, Bharat
    Ramachandran, Umakishore
    2013 33RD IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2013), 2013, : 309 - 314
  • [26] Gait feature learning via spatio-temporal two-branch networks
    Chen, Yifan
    Li, Xuelong
    PATTERN RECOGNITION, 2024, 147
  • [27] Spatio-temporal summarization of dance choreographies
    Rallis, Ioannis
    Doulamis, Nikolaos
    Doulamis, Anastasios
    Voulodimos, Athanasios
    Vescoukis, Vassilios
    COMPUTERS & GRAPHICS-UK, 2018, 73 : 88 - 101
  • [28] Spatio-Temporal Good Features to Track
    Feichtenhofer, Christoph
    Pinz, Axel
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 246 - 253
  • [29] Online Spatio-Temporal Fuzzy Relations
    Poli, Jean-Philippe
    Boudet, Laurence
    Le Yaouanc, Jean-Marie
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [30] Spatio-temporal properties of letter crowding
    Chung, Susana T. L.
    JOURNAL OF VISION, 2016, 16 (06):