MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引：0

作者：

Zhang, Zhenghao ^{[1
]}

Zhang, Shengfan ^{[1
]}

Dai, Zuozhuo ^{[1
]}

Dong, Zilong ^{[1
]}

Zhu, Siyu ^{[2
]}

机构：

[1] Alibaba Grp, Hangzhou 310030, Peoples R China

[2] Fudan Univ, Shanghai 200433, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 159卷

关键词：

Vision foundation model; Video instance segmentation; Deep learning;

D O I：

10.1016/j.patcog.2024.111100

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.

引用

页数：12

共 6 条

[1] Weakly Supervised Video Object Segmentation
Wang, Yufei
Hu, Yongjiang
Liew, Alan Wee-Chung
Wang, Junhu
PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 0315 - 0320
[2] Vanishing mask refinement in semi-supervised video object segmentation
Pita, Javier
Llerena, Juan P.
Patricio, Miguel A.
Berlanga, Antonio
Usero, Luis
APPLIED SOFT COMPUTING, 2025, 172
[3] MEM: Mask Enhancement Model for Video Object Segmentation
Abdelfattah, Islam
Shehata, Mohamed S.
ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 262 - 274
[4] OWS-Seg: Online Weakly Supervised Video Instance Segmentation via Contrastive Learning
Ning, Yuanxiang
Li, Fei
Dong, Mengping
Li, Zhenbo
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 476 - 488
[5] Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence
Fan, Jiaqing
Liu, Bo
Zhang, Kaihua
Liu, Qingshan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8153 - 8164
[6] Weakly supervised salient object detection via bounding-box annotation and SAM model
Liu, Xiangquan
Huang, Xiaoming
ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (03): : 1624 - 1645

← 1 →