Spatio-temporal Channel Correlation Networks for Action Classification

被引：134

作者：

Diba, Ali ^{[1
,4
]}

Fayyaz, Mohsen ^{[2
]}

Sharma, Vivek ^{[3
]}

Arzani, M. Mahdi ^{[4
]}

Yousefzadeh, Rahman ^{[4
]}

Gall, Juergen ^{[2
]}

Van Gool, Luc ^{[1
,4
]}

机构：

[1] Katholieke Univ Leuven, ESAT PSI, Leuven, Belgium

[2] Univ Bonn, Bonn, Germany

[3] KIT, CV HCI, Karlsruhe, Germany

[4] Sensifai, Brussels, Belgium

来源：

COMPUTER VISION - ECCV 2018, PT IV | 2018年 / 11208卷

基金：

欧洲研究理事会;

关键词：

RECOGNITION; HISTOGRAMS;

D O I：

10.1007/978-3-030-01225-0_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.

引用

页码：299 / 315

页数：17

共 43 条

[1]

[Anonymous], 2017, ABS170805038 CORR

[2]

[Anonymous], 2007, ACM MM

[3]

[Anonymous], 2017, CoRR

[4]

[Anonymous], 2016, arXiv

[5]

[Anonymous], 2017, ICCV

[6]

[Anonymous], 2016, ARXIV160800182

[7]

[Anonymous], 2015, arXiv: Learning

[8]

[Anonymous], 2016, ECCV WORKSH

[9]

[Anonymous], 2008, BMVC 2008 19 BRIT MA

[10]

[Anonymous], 2017, ARXIV PREPRINT ARXIV

← 1 2 3 4 5 →