Video Contrastive Learning with Global Context

被引:37
作者
Kuang, Haofei [1 ,3 ]
Zhu, Yi [2 ]
Zhang, Zhi [2 ]
Li, Xinyu [2 ]
Tighe, Joseph [2 ]
Schwertfeger, Soeren [3 ]
Stachniss, Cyrill [1 ]
Li, Mu [2 ]
机构
[1] Univ Bonn, Bonn, Germany
[2] Amazon Web Serv, Seattle, WA USA
[3] ShanghaiTech Univ, Shanghai, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) | 2021年
关键词
D O I
10.1109/ICCVW54120.2021.00358
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive learning has revolutionized the self-supervised image representation learning field and recently been adapted to the video domain. One of the greatest advantages of contrastive learning is that it allows us to flexibly define powerful loss objectives as long as we can find a reasonable way to formulate positive and negative samples to contrast. However, existing approaches rely heavily on the short-range spatiotemporal salience to form clip-level contrastive signals, thus limit themselves from using global context. In this paper, we propose a new video-level contrastive learning method based on segments to formulate positive pairs. Our formulation is able to capture the global context in a video, thus robust to temporal content change. We also incorporate a temporal order regularization term to enforce the inherent sequential structure of videos. Extensive experiments show that our video-level contrastive learning framework (VCLR) is able to outperform previous state-of-the-arts on five video datasets for downstream action classification, action localization, and video retrieval.
引用
收藏
页码:3188 / 3197
页数:10
相关论文
共 64 条
[1]  
Alayrac JB, 2020, ADV NEUR IN, V33
[2]  
Alwassel H, 2020, ADV NEUR IN, V33
[3]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00037
[4]  
[Anonymous], 2018, COMP VIS ECCV 2018 W, DOI DOI 10.1163/9789004385580002
[5]  
[Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00840
[6]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.168
[7]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.751
[8]   SpeedNet: Learning the Speediness in Videos [J].
Benaim, Sagie ;
Ephrat, Ariel ;
Lang, Oran ;
Mosseri, Inbar ;
Freeman, William T. ;
Rubinstein, Michael ;
Irani, Michal ;
Dekel, Tali .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9919-9928
[9]   Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning [J].
Buechler, Uta ;
Brattoli, Biagio ;
Ommer, Bjoern .
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :797-814
[10]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698