Motion-aware Contrastive Video Representation Learning via Foreground-background Merging

被引:24
作者
Ding, Shuangrui [1 ]
Li, Maomao [2 ]
Yang, Tianyu [2 ]
Qian, Rui [3 ]
Xu, Haohang [1 ]
Chen, Qingyi [4 ]
Wang, Jue [2 ]
Xiong, Hongkai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Univ Michigan, Ann Arbor, MI 48109 USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.00949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In light of the success of contrastive learning in the image domain, current self-supervised video representation learning methods usually employ contrastive loss to facilitate video representation learning. When naively pulling two augmented views of a video closer, the model however tends to learn the common static background as a shortcut but fails to capture the motion information, a phenomenon dubbed as background bias. Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others. Specifically, without any off-the-shelf detector, we extract the moving foreground out of background regions via the frame difference and color statistics, and shuffle the background regions among the videos. By leveraging the semantic consistency between the original clips and the fused ones, the model focuses more on the motion patterns and is debiased from the background shortcut. Extensive experiments demonstrate that FAME can effectively resist background cheating and thus achieve the state-of-the-art performance on downstream tasks across UCF101, HMDB51, and Diving48 datasets. The code and configurations are released at https://github.com/Mark12Ding/FAME.
引用
收藏
页码:9706 / 9716
页数:11
相关论文
共 50 条
  • [1] Algorithms for video foreground-background segmentation
    Liang, AS
    Toung, J
    Wong, E
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A317 - A321
  • [2] MaCLR: Motion-Aware Contrastive Learning of Representations for Videos
    Xiao, Fanyi
    Tighe, Joseph
    Modolo, Davide
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 353 - 370
  • [3] Foreground-Background Separation From Video Clips via Motion-Assisted Matrix Restoration
    Ye, Xinchen
    Yang, Jingyu
    Sun, Xin
    Li, Kun
    Hou, Chunping
    Wang, Yao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (11) : 1721 - 1734
  • [4] Background Modeling from Video Sequences via Online Motion-Aware RPCA
    Xu Weiyao
    Xia Ting
    Jing Changqiang
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (04) : 1411 - 1426
  • [5] A Foreground-background Segmentation Algorithm for Video Sequences
    Wei, Zhou
    Li, Peng
    HuangYue
    14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 340 - 343
  • [6] The foreground-background segmentation and the implicit learning of the background
    Jung, WH
    Lee, JS
    Kim, EJ
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 594 - 595
  • [7] Smooth foreground-background segmentation for video processing
    Schindler, K
    Wang, H
    COMPUTER VISION - ACCV 2006, PT II, 2006, 3852 : 581 - 590
  • [8] From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval
    Dong, Jianfeng
    Peng, Xiaoman
    Ma, Zhe
    Liu, Daizong
    Qu, Xiaoye
    Yang, Xun
    Zhu, Jixiang
    Liu, Baolong
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1273 - 1282
  • [9] Person Search via Background and Foreground Contrastive Learning
    Tang, Qing
    Jo, Kang-Hyun
    2022 15TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2022,
  • [10] Foreground-Background Contrastive Learning for Few-Shot Remote Sensing Image Scene Classification
    Geng, Jie
    Xue, Bohan
    Jiang, Wen
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61