A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

被引：153

作者：

Feichtenhofer, Christoph ^{[1
]}

Fan, Haoqi ^{[1
]}

Xiong, Bo ^{[1
]}

Girshick, Ross ^{[1
]}

He, Kaiming ^{[1
]}

机构：

[1] Facebook AI Res FAIR, Paris, France

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00331

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart.

引用

页码：3298 / 3308

页数：11

共 96 条

[1] Learning to See by Moving [J].

Agrawal, Pulkit ;

Carreira, Joao ;

Malik, Jitendra .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45

[2]

Alayrac J.-B., 2020, Advances in Neural Information Processing Systems, V33, P25

[3]

Alwassel Humam., 2020, Advances in Neural Information Processing Systems, V33, P9758

[4]

[Anonymous], 2016, P CVPR

[5]

[Anonymous], 2020, NeurIPS

[6]

[Anonymous], P INT C ROB AUT

[7] Look, Listen and Learn [J].

Arandjelovic, Relja ;

Zisserman, Andrew .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :609-617

[8]

Becker Suzanna, 1997, NEURIPS

[9]

Benaim Sagie, 2020, P CVPR

[10] Deep Clustering for Unsupervised Learning of Visual Features [J].

Caron, Mathilde ;

Bojanowski, Piotr ;

Joulin, Armand ;

Douze, Matthijs .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156

← 1 2 3 4 5 6 7 8 9 10 →