Unsupervised Video Representation Learning by Bidirectional Feature Prediction

被引：13

作者：

Behrmann, Nadine ^{[1
]}

Gall, Juergen ^{[2
]}

Noroozi, Mehdi ^{[1
]}

机构：

[1] Bosch Ctr Artificial Intelligence, Renningen, Germany

[2] Univ Bonn, Bonn, Germany

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021) | 2021年

关键词：

D O I：

10.1109/WACV48630.2021.00171

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from unobserved past frames is complementary to one that originates from the future frames. The rationale behind our method is to encourage the network to explore the temporal structure of videos by distinguishing between future and past given present observations. We train our model in a contrastive learning framework, where joint encoding of future and past provides us with a comprehensive set of temporal hard negatives via swapping. We empirically show that utilizing both signals enriches the learned representations for the downstream task of action recognition. It outperforms independent prediction of future and past.

引用

页码：1669 / 1678

页数：10

共 45 条

[1] Learning to See by Moving [J].

Agrawal, Pulkit ;

Carreira, Joao ;

Malik, Jitendra .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :37-45

[2]

[Anonymous], 2016, ICLR WORKSH

[3]

Asano Yuki M, 2020, P INT C LEARN REPR I, P7

[4]

Bachman P, 2019, ADV NEUR IN, V32

[5]

Benaim S., 2020, P IEEECVF C COMPUTER, P9922

[6]

Bierbrauer A, 2017, IEEE PAC RIM CONF CO

[7]

Chen T, 2020, PR MACH LEARN RES, V119

[8] OOPS! Predicting Unintentional Action in Video [J].

Epstein, Dave ;

Chen, Boyuan ;

Vondrick, Carl .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :916-926

[9] Self-Supervised Video Representation Learning With Odd-One-Out Networks [J].

Fernando, Basura ;

Bilen, Hakan ;

Gavves, Efstratios ;

Gould, Stephen .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5729-5738

[10] From Lifestyle Vlogs to Everyday Interactions [J].

Fouhey, David F. ;

Kuo, Wei-cheng ;

Efros, Alexei A. ;

Malik, Jitendra .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4991-5000

← 1 2 3 4 5 →