Motion-aware Contrastive Video Representation Learning via Foreground-background Merging

被引：24

作者：

Ding, Shuangrui ^{[1
]}

Li, Maomao ^{[2
]}

Yang, Tianyu ^{[2
]}

Qian, Rui ^{[3
]}

Xu, Haohang ^{[1
]}

Chen, Qingyi ^{[4
]}

Wang, Jue ^{[2
]}

Xiong, Hongkai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Tencent AI Lab, Shenzhen, Peoples R China

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[4] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52688.2022.00949

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In light of the success of contrastive learning in the image domain, current self-supervised video representation learning methods usually employ contrastive loss to facilitate video representation learning. When naively pulling two augmented views of a video closer, the model however tends to learn the common static background as a shortcut but fails to capture the motion information, a phenomenon dubbed as background bias. Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others. Specifically, without any off-the-shelf detector, we extract the moving foreground out of background regions via the frame difference and color statistics, and shuffle the background regions among the videos. By leveraging the semantic consistency between the original clips and the fused ones, the model focuses more on the motion patterns and is debiased from the background shortcut. Extensive experiments demonstrate that FAME can effectively resist background cheating and thus achieve the state-of-the-art performance on downstream tasks across UCF101, HMDB51, and Diving48 datasets. The code and configurations are released at https://github.com/Mark12Ding/FAME.

引用

页码：9706 / 9716

页数：11

共 50 条

[1] Algorithms for video foreground-background segmentation
Liang, AS
Toung, J
Wong, E
PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A317 - A321
[2] MaCLR: Motion-Aware Contrastive Learning of Representations for Videos
Xiao, Fanyi
Tighe, Joseph
Modolo, Davide
COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 353 - 370
[3] Foreground-Background Separation From Video Clips via Motion-Assisted Matrix Restoration
Ye, Xinchen
Yang, Jingyu
Sun, Xin
Li, Kun
Hou, Chunping
Wang, Yao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (11) : 1721 - 1734
[4] Background Modeling from Video Sequences via Online Motion-Aware RPCA
Xu Weiyao
Xia Ting
Jing Changqiang
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (04) : 1411 - 1426
[5] A Foreground-background Segmentation Algorithm for Video Sequences
Wei, Zhou
Li, Peng
HuangYue
14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 340 - 343
[6] The foreground-background segmentation and the implicit learning of the background
Jung, WH
Lee, JS
Kim, EJ
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 594 - 595
[7] Smooth foreground-background segmentation for video processing
Schindler, K
Wang, H
COMPUTER VISION - ACCV 2006, PT II, 2006, 3852 : 581 - 590
[8] From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval
Dong, Jianfeng
Peng, Xiaoman
Ma, Zhe
Liu, Daizong
Qu, Xiaoye
Yang, Xun
Zhu, Jixiang
Liu, Baolong
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1273 - 1282
[9] Person Search via Background and Foreground Contrastive Learning
Tang, Qing
Jo, Kang-Hyun
2022 15TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2022,
[10] Foreground-Background Contrastive Learning for Few-Shot Remote Sensing Image Scene Classification
Geng, Jie
Xue, Bohan
Jiang, Wen
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61

← 1 2 3 4 5 →