Self Supervised Progressive Network for High Performance Video Object Segmentation

被引：4

作者：

Li, Guorong ^{[1
]}

Hong, Dexiang ^{[1
]}

Xu, Kai ^{[1
]}

Zhong, Bineng ^{[2
]}

Su, Li ^{[1
]}

Han, Zhenjun ^{[3
]}

Huang, Qingming ^{[1
]}

机构：

[1] Univ ChineseAcademy Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China

[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[3] Univ Chinese Acad Sci UCAS, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Customer relationship management; Semantics; Object segmentation; Collaboration; Visualization; Decoding; Cycle consistency; self-supervised; similarity learning; video object segmentation (VOS); TRACKING;

D O I：

10.1109/TNNLS.2022.3219936

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, self-supervised video object segmentation (VOS) has attracted much interest. However, most proxy tasks are proposed to train only a single backbone, which relies on a point-to-point correspondence strategy to propagate masks through a video sequence. Due to its simple pipeline, the performance of the single backbone paradigm is still unsatisfactory. Instead of following the previous literature, we propose our self-supervised progressive network (SSPNet) which consists of a memory retrieval module (MRM) and collaborative refinement module (CRM). The MRM can perform point-to-point correspondence and produce a propagated coarse mask for a query frame through self-supervised pixel-level and frame-level similarity learning. The CRM, which is trained via cycle consistency region tracking, aggregates the reference & query information and learns the collaborative relationship among them implicitly to refine the coarse mask. Furthermore, to learn semantic knowledge from unlabeled data, we also design two novel mask-generation strategies to provide the training data with meaningful semantic information for the CRM. Extensive experiments conducted on DAVIS-17, YouTube-VOS and SegTrack v2 demonstrate that our method surpasses the state-of-the-art self-supervised methods and narrows the gap with the fully supervised methods.

引用

页码：7671 / 7684

页数：14

共 50 条

[1] Self-Supervised Deep TripleNet for Video Object Segmentation
Xu, Kai
Wen, Longyin
Li, Guorong
Huang, Qingming
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3530 - 3539
[2] Weakly-Supervised RGBD Video Object Segmentation
Yang, Jinyu
Gao, Mingqi
Zheng, Feng
Zhen, Xiantong
Ji, Rongrong
Shao, Ling
Leonardis, Ales
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2158 - 2170
[3] From Pixels to Semantics: Self-Supervised Video Object Segmentation With Multiperspective Feature Mining
Li, Ruoqi
Wang, Yifan
Wang, Lijun
Lu, Huchuan
Wei, Xiaopeng
Zhang, Qiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5801 - 5812
[4] Guided Co-Segmentation Network for Fast Video Object Segmentation
Liu, Weide
Lin, Guosheng
Zhang, Tianyi
Liu, Zichuan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1607 - 1617
[5] Self-Teaching Video Object Segmentation
Zhou, Chuanwei
Xu, Chunyan
Cui, Zhen
Zhang, Tong
Yang, Jian
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1623 - 1637
[6] Self-supervised video object segmentation using integration-augmented attention
Zhu, Wenjun
Meng, Jun
Xu, Li
NEUROCOMPUTING, 2021, 455 : 325 - 339
[7] Motion perception-driven multimodal self-supervised video object segmentation
Wang, Jun
Cao, Honghui
Sun, Chenhao
Huang, Ziqing
Zhang, Yonghua
VISUAL COMPUTER, 2024,
[8] Separable Structure Modeling for Semi-Supervised Video Object Segmentation
Zhu, Wencheng
Li, Jiahao
Lu, Jiwen
Zhou, Jie
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 330 - 344
[9] Motion-Guided Cascaded Refinement Network for Video Object Segmentation
Hu, Ping
Wang, Gang
Kong, Xiangfei
Kuen, Jason
Tan, Yap-Peng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 1957 - 1967
[10] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
Zhou, Tianfei
Li, Jianwu
Wang, Shunzhou
Tao, Ran
Shen, Jianbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338

← 1 2 3 4 5 →