Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

被引：520

作者：

Chen, Yunpeng ^{[1
,2
]}

Fan, Haoqi ^{[1
]}

Xu, Bing ^{[1
]}

Yan, Zhicheng ^{[1
]}

Kalantidis, Yannis ^{[1
]}

Rohrbach, Marcus ^{[1
]}

Yan, Shuicheng ^{[2
,3
]}

Feng, Jiashi ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Natl Univ Singapore, Singapore, Singapore

[3] Yitu Technol, Singapore, Singapore

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00353

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation1 to store and process feature maps that vary spatially "slower" at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-andplay convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.

引用

页码：3434 / 3443

页数：10

共 50 条

[1]

[Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124

[2]

[Anonymous], 2019, IEEE T MED IMAGING, DOI DOI 10.1109/TMI.2018.2867261

[3]

[Anonymous], 2018, PROC CVPR IEEE, DOI [DOI 10.1109/CVPR.2018.00745, DOI 10.1109/TPAMI.2019.2913372]

[4] APPLICATION OF FOURIER ANALYSIS TO VISIBILITY OF GRATINGS [J].

CAMPBELL, FW ;

ROBSON, JG .

JOURNAL OF PHYSIOLOGY-LONDON, 1968, 197 (03) :551-&

[5]

Carreira J, 2018, ARXIV

[6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[7]

Chen C, 2019, P INT C LEARN REPR

[8]

Chen TQ, 2018, PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P579

[9]

Chen W, 2019, TEXT BIOENG INFORM S, P352

[10]

Chen Y., 2017, ADV NEURAL INFORM PR, P4467

← 1 2 3 4 5 →