Self Supervised Progressive Network for High Performance Video Object Segmentation

被引:4
|
作者
Li, Guorong [1 ]
Hong, Dexiang [1 ]
Xu, Kai [1 ]
Zhong, Bineng [2 ]
Su, Li [1 ]
Han, Zhenjun [3 ]
Huang, Qingming [1 ]
机构
[1] Univ ChineseAcademy Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[3] Univ Chinese Acad Sci UCAS, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Customer relationship management; Semantics; Object segmentation; Collaboration; Visualization; Decoding; Cycle consistency; self-supervised; similarity learning; video object segmentation (VOS); TRACKING;
D O I
10.1109/TNNLS.2022.3219936
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, self-supervised video object segmentation (VOS) has attracted much interest. However, most proxy tasks are proposed to train only a single backbone, which relies on a point-to-point correspondence strategy to propagate masks through a video sequence. Due to its simple pipeline, the performance of the single backbone paradigm is still unsatisfactory. Instead of following the previous literature, we propose our self-supervised progressive network (SSPNet) which consists of a memory retrieval module (MRM) and collaborative refinement module (CRM). The MRM can perform point-to-point correspondence and produce a propagated coarse mask for a query frame through self-supervised pixel-level and frame-level similarity learning. The CRM, which is trained via cycle consistency region tracking, aggregates the reference & query information and learns the collaborative relationship among them implicitly to refine the coarse mask. Furthermore, to learn semantic knowledge from unlabeled data, we also design two novel mask-generation strategies to provide the training data with meaningful semantic information for the CRM. Extensive experiments conducted on DAVIS-17, YouTube-VOS and SegTrack v2 demonstrate that our method surpasses the state-of-the-art self-supervised methods and narrows the gap with the fully supervised methods.
引用
收藏
页码:7671 / 7684
页数:14
相关论文
共 50 条
  • [1] Self-Supervised Deep TripleNet for Video Object Segmentation
    Xu, Kai
    Wen, Longyin
    Li, Guorong
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3530 - 3539
  • [2] Weakly-Supervised RGBD Video Object Segmentation
    Yang, Jinyu
    Gao, Mingqi
    Zheng, Feng
    Zhen, Xiantong
    Ji, Rongrong
    Shao, Ling
    Leonardis, Ales
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2158 - 2170
  • [3] From Pixels to Semantics: Self-Supervised Video Object Segmentation With Multiperspective Feature Mining
    Li, Ruoqi
    Wang, Yifan
    Wang, Lijun
    Lu, Huchuan
    Wei, Xiaopeng
    Zhang, Qiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5801 - 5812
  • [4] Guided Co-Segmentation Network for Fast Video Object Segmentation
    Liu, Weide
    Lin, Guosheng
    Zhang, Tianyi
    Liu, Zichuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1607 - 1617
  • [5] Self-Teaching Video Object Segmentation
    Zhou, Chuanwei
    Xu, Chunyan
    Cui, Zhen
    Zhang, Tong
    Yang, Jian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1623 - 1637
  • [6] Self-supervised video object segmentation using integration-augmented attention
    Zhu, Wenjun
    Meng, Jun
    Xu, Li
    NEUROCOMPUTING, 2021, 455 : 325 - 339
  • [7] Motion perception-driven multimodal self-supervised video object segmentation
    Wang, Jun
    Cao, Honghui
    Sun, Chenhao
    Huang, Ziqing
    Zhang, Yonghua
    VISUAL COMPUTER, 2024,
  • [8] Separable Structure Modeling for Semi-Supervised Video Object Segmentation
    Zhu, Wencheng
    Li, Jiahao
    Lu, Jiwen
    Zhou, Jie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 330 - 344
  • [9] Motion-Guided Cascaded Refinement Network for Video Object Segmentation
    Hu, Ping
    Wang, Gang
    Kong, Xiangfei
    Kuen, Jason
    Tan, Yap-Peng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 1957 - 1967
  • [10] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
    Zhou, Tianfei
    Li, Jianwu
    Wang, Shunzhou
    Tao, Ran
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338