Spatiotemporal Initialization for 3D CNNs with Generated Motion Patterns

被引:7
作者
Kataoka, Hirokatsu [1 ]
Hara, Kensho [1 ]
Hayashi, Ryusuke [1 ]
Yamagata, Eisuke [2 ]
Inoue, Nakamasa [2 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan
[2] Tokyo Inst Technol, Tokyo, Japan
来源
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) | 2022年
关键词
PERCEPTION; RESPONSES; MODEL;
D O I
10.1109/WACV51458.2022.00081
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper proposes a framework of Formula-Driven Supervised Learning (FDSL) for spatiotemporal initialization. Our FDSL approach enables to automatically and simultaneously generate motion patterns and their video labels with a simple formula which is based on Perlin noise. We designed a dataset of generated motion patterns adequate for the 3D CNNs to learn a better basis set of natural videos. The constructed Video Perlin Noise (VPN) dataset can be applied to initialize a model before pre-training with large-scale video datasets such as Kinetics-400/700, to enhance target task performance. Our spatiotemporal initialization with VPN dataset (VPN initialization) outperforms the previous initialization method with the inflated 3D ConvNet (I3D) using 2D ImageNet dataset. Our proposed method increased the top-1 video-level accuracy of Kinetics-400 pre-trained model on {Kinetics-400, UCF-101, HMDB-51, ActivityNet} datasets. Especially, the proposed method increased the performance rate of Kinetics-400 pre-trained model by 10.3 pt on ActivityNet. We also report that the relative performance improvements from the baseline are greater in 3D CNNs rather than other models. Our VPN initialization mainly helps to enhance the performance in spatiotemporal 3D kernels. The datasets, codes and pretrained models used in this study will be publicly available(1).
引用
收藏
页码:737 / 746
页数:10
相关论文
共 38 条
[1]   SPATIOTEMPORAL ENERGY MODELS FOR THE PERCEPTION OF MOTION [J].
ADELSON, EH ;
BERGEN, JR .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1985, 2 (02) :284-299
[2]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]  
Carreira Joao, 2019, CoRR
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[7]  
Farin G., 1993, Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide
[8]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[9]   SEPARATE VISUAL PATHWAYS FOR PERCEPTION AND ACTION [J].
GOODALE, MA ;
MILNER, AD .
TRENDS IN NEUROSCIENCES, 1992, 15 (01) :20-25
[10]   Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [J].
Hara, Kensho ;
Kataoka, Hirokatsu ;
Satoh, Yutaka .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6546-6555