MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引:0
|
作者
Zhang, Zhenghao [1 ]
Zhang, Shengfan [1 ]
Dai, Zuozhuo [1 ]
Dong, Zilong [1 ]
Zhu, Siyu [2 ]
机构
[1] Alibaba Grp, Hangzhou 310030, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
关键词
Vision foundation model; Video instance segmentation; Deep learning;
D O I
10.1016/j.patcog.2024.111100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
引用
收藏
页数:12
相关论文
共 6 条
  • [1] Weakly Supervised Video Object Segmentation
    Wang, Yufei
    Hu, Yongjiang
    Liew, Alan Wee-Chung
    Wang, Junhu
    PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 0315 - 0320
  • [2] Vanishing mask refinement in semi-supervised video object segmentation
    Pita, Javier
    Llerena, Juan P.
    Patricio, Miguel A.
    Berlanga, Antonio
    Usero, Luis
    APPLIED SOFT COMPUTING, 2025, 172
  • [3] MEM: Mask Enhancement Model for Video Object Segmentation
    Abdelfattah, Islam
    Shehata, Mohamed S.
    ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 262 - 274
  • [4] OWS-Seg: Online Weakly Supervised Video Instance Segmentation via Contrastive Learning
    Ning, Yuanxiang
    Li, Fei
    Dong, Mengping
    Li, Zhenbo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 476 - 488
  • [5] Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence
    Fan, Jiaqing
    Liu, Bo
    Zhang, Kaihua
    Liu, Qingshan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8153 - 8164
  • [6] Weakly supervised salient object detection via bounding-box annotation and SAM model
    Liu, Xiangquan
    Huang, Xiaoming
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (03): : 1624 - 1645