Spatio-temporal Channel Correlation Networks for Action Classification

被引:128
|
作者
Diba, Ali [1 ,4 ]
Fayyaz, Mohsen [2 ]
Sharma, Vivek [3 ]
Arzani, M. Mahdi [4 ]
Yousefzadeh, Rahman [4 ]
Gall, Juergen [2 ]
Van Gool, Luc [1 ,4 ]
机构
[1] Katholieke Univ Leuven, ESAT PSI, Leuven, Belgium
[2] Univ Bonn, Bonn, Germany
[3] KIT, CV HCI, Karlsruhe, Germany
[4] Sensifai, Brussels, Belgium
来源
COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷
基金
欧洲研究理事会;
关键词
RECOGNITION; HISTOGRAMS;
D O I
10.1007/978-3-030-01225-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.
引用
收藏
页码:299 / 315
页数:17
相关论文
共 50 条
  • [41] Aligning Spatio-Temporal Signals on a Special Manifold
    Li, Ruonan
    Chellappa, Rama
    COMPUTER VISION-ECCV 2010, PT V, 2010, 6315 : 547 - 560
  • [42] A Novel Spatio-Temporal Violence Classification Framework Based on Material Derivative and LSTM Neural Network
    Lejmi, Wafa
    Ben Khalifa, Anouar
    Mahjoub, Mohamed Ali
    TRAITEMENT DU SIGNAL, 2020, 37 (05) : 687 - 701
  • [43] Probabilistic spatio-temporal retrieval in smart spaces
    Vivek Menon
    Bharat Jayaraman
    Venu Govindaraju
    Journal of Ambient Intelligence and Humanized Computing, 2014, 5 : 383 - 392
  • [44] A spatio-temporal pyramid matching for video retrieval
    Choi, Jaesik
    Wang, Ziyu
    Lee, Sang-Chul
    Jeon, Won J.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (06) : 660 - 669
  • [45] Robust Event Detection based on Spatio-Temporal Latent Action Unit using Skeletal Information
    Xing, Hao
    Xue, Yuxuan
    Zhou, Mingchuan
    Burschka, Darius
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 2941 - 2948
  • [46] Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition
    Hong, Younggi
    Kim, Min Ju
    Lee, Isack
    Yoo, Seok Bong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6411 - 6418
  • [47] Video Text Tracking With a Spatio-Temporal Complementary Model
    Gao, Yuzhe
    Li, Xing
    Zhang, Jiajian
    Zhou, Yu
    Jin, Dian
    Wang, Jing
    Zhu, Shenggao
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9321 - 9331
  • [48] HOG and HOOF Spatio-Temporal Descriptors for Gesture Recognition
    Agab, Salah Eddine
    Chelali, Fatma Zohra
    2018 INTERNATIONAL CONFERENCE ON SIGNAL, IMAGE, VISION AND THEIR APPLICATIONS (SIVA), 2018,
  • [49] Exploring the spatio-temporal neural basis of face learning
    Yang, Ying
    Xu, Yang
    Jew, Carol A.
    Pyles, John A.
    Kass, Robert E.
    Tarr, Michael J.
    JOURNAL OF VISION, 2017, 17 (06):
  • [50] Bilinear Models for Spatio-Temporal Point Distribution Analysis
    Hoogendoorn, Corne
    Sukno, Federico M.
    Ordas, Sebastian
    Frangi, Alejandro F.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 85 (03) : 237 - 252