Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

被引:123
|
作者
Wang, Jiangliu [1 ,3 ]
Jiao, Jianbo [2 ,3 ]
Bao, Linchao [3 ]
He, Shengfeng [4 ]
Liu, Yunhui [1 ]
Liu, Wei [3 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Oxford, Oxford, England
[3] Tencent AI Lab, Bellevue, WA 98004 USA
[4] South China Univ Technol, Guangzhou, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR.2019.00413
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_inas.
引用
收藏
页码:4001 / 4010
页数:10
相关论文
共 50 条
  • [1] Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics
    Wang, Jiangliu
    Jiao, Jianbo
    Bao, Linchao
    He, Shengfeng
    Liu, Wei
    Liu, Yun-hui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3791 - 3806
  • [2] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [3] CONTRASTIVE SELF-SUPERVISED LEARNING FOR SPATIO-TEMPORAL ANALYSIS OF LUNG ULTRASOUND VIDEOS
    Chen, Li
    Rubin, Jonathan
    Ouyang, Jiahong
    Balaraju, Naveen
    Patil, Shubham
    Mehanian, Courosh
    Kulhare, Sourabh
    Millin, Rachel
    Gregory, Kenton W.
    Gregory, Cynthia R.
    Zhu, Meihua
    Kessler, David O.
    Malia, Laurie
    Dessie, Almaz
    Rabiner, Joni
    Coneybeare, Di
    Shopsin, Bo
    Hersh, Andrew
    Madar, Cristian
    Shupp, Jeffrey
    Johnson, Laura S.
    Avila, Jacob
    Dwyer, Kristin
    Weimersheimer, Peter
    Raju, Balasundar
    Kruecker, Jochen
    Chen, Alvin
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [4] Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning
    Yao, Yuan
    Liu, Chang
    Luo, Dezhao
    Zhou, Yu
    Ye, Qixiang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6547 - 6556
  • [5] SELF-SUPERVISED SPATIO-TEMPORAL REPRESENTATION LEARNING OF SATELLITE IMAGE TIME SERIES
    Dumeur, Iris
    Valero, Silvia
    Inglada, Jordi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 642 - 645
  • [6] Joint spatio-temporal features constrained self-supervised electrocardiogram representation learning
    Ran, Ao
    Liu, Huafeng
    BIOMEDICAL ENGINEERING LETTERS, 2024, 14 (02) : 209 - 220
  • [7] Joint spatio-temporal features constrained self-supervised electrocardiogram representation learning
    Ao Ran
    Huafeng Liu
    Biomedical Engineering Letters, 2024, 14 : 209 - 220
  • [8] Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos
    Shen, Zhiqiang
    Sheng, Xiaoxiao
    Fan, Hehe
    Wang, Longguang
    Guo, Yulan
    Liu, Qiong
    Wen, Hao
    Zhou, Xi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16534 - 16543
  • [9] Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series
    Dumeur, Iris
    Valero, Silvia
    Inglada, Jordi
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 (4350-4367) : 4350 - 4367
  • [10] Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences
    Zhou, Yujie
    Duan, Haodong
    Rao, Anyi
    Su, Bing
    Wang, Jiaqi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3825 - 3833