Spatio-temporal Channel Correlation Networks for Action Classification

被引:129
|
作者
Diba, Ali [1 ,4 ]
Fayyaz, Mohsen [2 ]
Sharma, Vivek [3 ]
Arzani, M. Mahdi [4 ]
Yousefzadeh, Rahman [4 ]
Gall, Juergen [2 ]
Van Gool, Luc [1 ,4 ]
机构
[1] Katholieke Univ Leuven, ESAT PSI, Leuven, Belgium
[2] Univ Bonn, Bonn, Germany
[3] KIT, CV HCI, Karlsruhe, Germany
[4] Sensifai, Brussels, Belgium
来源
基金
欧洲研究理事会;
关键词
RECOGNITION; HISTOGRAMS;
D O I
10.1007/978-3-030-01225-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.
引用
收藏
页码:299 / 315
页数:17
相关论文
共 50 条
  • [31] Exploring hybrid spatio-temporal convolutional networks for human action recognition
    Hao Wang
    Yanhua Yang
    Erkun Yang
    Cheng Deng
    Multimedia Tools and Applications, 2017, 76 : 15065 - 15081
  • [32] Spatio-Temporal GRU for Trajectory Classification
    Liu, Hong-Bin
    Wu, Hao
    Sun, Weiwei
    Lee, Ickjai
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1228 - 1233
  • [33] Spatio-Temporal Saliency for Action Similarity
    Burghouts, G. J.
    van den Broek, S. P.
    ten Hove, R. J. M.
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, : 257 - 262
  • [34] Discovering spatio-temporal action tubes
    Ye, Yuancheng
    Yang, Xiaodong
    Tian, YingLi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 58 : 515 - 524
  • [35] A Spatio-Temporal Approach for Apathy Classification
    Das, Abhijit
    Niu, Xuesong
    Dantcheva, Antitza
    Happy, S. L.
    Han, Hu
    Zeghari, Radia
    Robert, Philippe
    Shan, Shiguang
    Bremond, Francois
    Chen, Xilin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2561 - 2573
  • [36] Spatio-temporal classification for polyp diagnosis
    Puyal, Juana Gonzalez-Bueno
    Brandao, Patrick
    Ahmad, Omer F.
    Bhatia, Kanwal K.
    Toth, Daniel
    Kader, Rawen
    Lovat, Laurence
    Mountney, Peter
    Stoyanov, Danail
    BIOMEDICAL OPTICS EXPRESS, 2023, 14 (02) : 593 - 607
  • [37] Effect of spatio-temporal channel correlation on the performance of space-time codes
    Fragouli, C
    Al-Dhahir, N
    Turin, W
    2002 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2002, : 826 - 830
  • [38] HUMAN ACTION CLASSIFICATION USING SURF BASED SPATIO-TEMPORAL CORRELATED DESCRIPTORS
    Sabri, A. Q. Md
    Boonaert, J.
    Lecoeuche, S.
    Mouaddib, E.
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1401 - 1404
  • [39] REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY
    Tim, Stefen Chan Wai
    Rombaut, Michele
    Pellerin, Denis
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1133 - 1137
  • [40] Spatio-Temporal RBF Neural Networks
    Khan, Shujaat
    Ahmad, Jawwad
    Sadiq, Alishba
    Naseem, Imran
    Moinuddin, Muhammad
    2018 3RD INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING, SCIENCES AND TECHNOLOGY (ICEEST), 2018,