SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark

被引:0
|
作者
Yang, Ruolin [1 ]
Li, Da [2 ]
Hu, Conghui [3 ]
Zhang, Honggang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, SketchX, Surrey GU2 7XH, England
[3] Natl Univ Singapore, Dept Comp Sci, Singapore 119077, Singapore
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期
关键词
sketches; video object segmentation; sketch-based datasets;
D O I
10.3390/app15041751
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model's ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
    Miao, Jiaxu
    Wang, Xiaohan
    Wu, Yu
    Li, Wei
    Zhang, Xu
    Wei, Yunchao
    Yang, Yi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21001 - 21011
  • [2] Edgel Index for Large-Scale Sketch-based Image Search
    Cao, Yang
    Wang, Changhu
    Zhang, Liqing
    Zhang, Lei
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 761 - 768
  • [3] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
    Ding, Henghui
    Liu, Chang
    He, Shuting
    Jiang, Xudong
    Loy, Chen Change
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2694 - 2703
  • [4] A Large-Scale Benchmark for Food Image Segmentation
    Wu, Xiongwei
    Fu, Xin
    Liu, Ying
    Lim, Ee-Peng
    Hoi, Steven C. H.
    Sun, Qianru
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 506 - 515
  • [5] ClearPose: Large-scale Transparent Object Dataset and Benchmark
    Chen, Xiaotong
    Zhang, Huijie
    Yu, Zeren
    Opipari, Anthony
    Jenkins, Odest Chadwicke
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 381 - 396
  • [6] IdeaPanel: A Large Scale Interactive Sketch-based Image Search System
    Xiao, Changcheng
    Wang, Changhu
    Zhang, Liqing
    Zhang, Lei
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 667 - 668
  • [7] Sketch-based evaluation of image segmentation methods
    Gavilan, David
    Takahashi, Hiroki
    Saito, Suguru
    Nakajima, Masayuki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 156 - 164
  • [8] Sketch-Based Retrieval in Large-Scale Image Database via Position-Aware Silhouette Matching
    Hu, Shijie
    Zhang, Hongxin
    Zhang, Sanyuan
    Fang, Zishuo
    Huang, Qi
    E-LEARNING AND GAMES, 2016, 9654 : 243 - 256
  • [9] Sketch-Based Annotation and Visualization in Video Authoring
    Ma, Cui-Xia
    Liu, Yong-Jin
    Wang, Hong-An
    Teng, Dong-Xing
    Dai, Guo-Zhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (04) : 1153 - 1165
  • [10] Collaborative Camouflaged Object Detection: A Large-Scale Dataset and Benchmark
    Zhang, Cong
    Bi, Hongbo
    Xiang, Tian-Zhu
    Wu, Ranwan
    Tong, Jinghui
    Wang, Xiufang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 15