A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

被引:153
作者
Feichtenhofer, Christoph [1 ]
Fan, Haoqi [1 ]
Xiong, Bo [1 ]
Girshick, Ross [1 ]
He, Kaiming [1 ]
机构
[1] Facebook AI Res FAIR, Paris, France
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart.
引用
收藏
页码:3298 / 3308
页数:11
相关论文
共 96 条
[1]   Learning to See by Moving [J].
Agrawal, Pulkit ;
Carreira, Joao ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45
[2]  
Alayrac J.-B., 2020, Advances in Neural Information Processing Systems, V33, P25
[3]  
Alwassel Humam., 2020, Advances in Neural Information Processing Systems, V33, P9758
[4]  
[Anonymous], 2016, P CVPR
[5]  
[Anonymous], 2020, NeurIPS
[6]  
[Anonymous], P INT C ROB AUT
[7]   Look, Listen and Learn [J].
Arandjelovic, Relja ;
Zisserman, Andrew .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :609-617
[8]  
Becker Suzanna, 1997, NEURIPS
[9]  
Benaim Sagie, 2020, P CVPR
[10]   Deep Clustering for Unsupervised Learning of Visual Features [J].
Caron, Mathilde ;
Bojanowski, Piotr ;
Joulin, Armand ;
Douze, Matthijs .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156