Two-shot Video Object Segmentation

被引:12
|
作者
Yan, Kun [1 ]
Li, Xiao [2 ]
Wei, Fangyun [2 ]
Wang, Jinglu [2 ]
Zhang, Chenbin [1 ]
Wang, Ping [1 ]
Lu, Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous works on video object segmentation (VOS) are trained on densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model on sparsely annotated videos-we merely require two labeled frames per training video while the performance is sustained. We term this novel training paradigm as two-shot video object segmentation, or two-shot VOS for short. The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data. Our approach is extremely simple and can be applied to a majority of existing frameworks. We first pre-train a VOS model on sparsely annotated videos in a semi-supervised manner, with the first frame always being a labeled one. Then, we adopt the pre-trained VOS model to generate pseudo labels for all unlabeled frames, which are subsequently stored in a pseudo-label bank. Finally, we retrain a VOS model on both labeled and pseudo-labeled data without any restrictions on the first frame. For the first time, we present a general way to train VOS models on two-shot VOS datasets. By using 7.3% and 2.9% labeled data of YouTube-VOS and DAVIS benchmarks, our approach achieves comparable results in contrast to the counterparts trained on fully labeled set. Code and models are available at https://github.com/ykpku/Two-shot-Video-Object-Segmentation.
引用
收藏
页码:2257 / 2267
页数:11
相关论文
共 50 条
  • [41] Fast target-aware learning for few-shot video object segmentation
    Yadang Chen
    Chuanyan Hao
    Zhi-Xin Yang
    Enhua Wu
    Science China Information Sciences, 2022, 65
  • [42] Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
    Wang, Wenguan
    Lu, Xiankai
    Shen, Jianbing
    Crandall, David
    Shao, Ling
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9235 - 9244
  • [43] Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
    Lu, Xiankai
    Wang, Wenguan
    Shen, Jianbing
    Crandall, David
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2228 - 2242
  • [44] Single Image Dehazing Via Region Adaptive Two-Shot Network
    Li, Hui
    Wu, Qingbo
    Ngan, King Ngi
    Li, Hongliang
    Meng, Fanman
    IEEE MULTIMEDIA, 2021, 28 (03) : 97 - 106
  • [45] Two-shot sparse depth estimation using adaptive structured light
    Li, Q.
    Biswas, M.
    Pickering, M. R.
    Frater, M. R.
    ELECTRONICS LETTERS, 2011, 47 (13) : 745 - U30
  • [46] Two-shot point-diffraction interferometer with an unknown phase shift
    Bai, Fuzhong
    Liu, Zhen
    Bao, Xiaoyan
    JOURNAL OF OPTICS, 2010, 12 (04)
  • [47] Single Shot Video Object Detector
    Deng, Jiajun
    Pan, Yingwei
    Yao, Ting
    Zhou, Wengang
    Li, Houqiang
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 846 - 858
  • [48] Single Shot Video Object Detector
    Zhou, Wengang (zhwg@ustc.edu.cn); Yao, Ting (tingyao.ustc@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc. (23):
  • [49] Integrated video shot segmentation algorithm
    Li, WK
    Lai, SH
    STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2003, 2003, 5021 : 264 - 271
  • [50] Sport video shot segmentation and classification
    Dahyot, R
    Rea, N
    Kokaram, A
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 404 - 413