SKVOS: Sketch-Based Video Object Segmentation with a Large-Scale Benchmark

被引：0

作者：

Yang, Ruolin ^{[1
]}

Li, Da ^{[2
]}

Hu, Conghui ^{[3
]}

Zhang, Honggang ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

[2] Univ Surrey, Ctr Vis Speech & Signal Proc, SketchX, Surrey GU2 7XH, England

[3] Natl Univ Singapore, Dept Comp Sci, Singapore 119077, Singapore

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 04期

关键词：

sketches; video object segmentation; sketch-based datasets;

D O I：

10.3390/app15041751

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In this paper, we propose sketch-based video object segmentation (SKVOS), a novel task that segments objects consistently across video frames using human-drawn sketches as queries. Traditional reference-based methods, such as photo masks and language descriptions, are commonly used for segmentation. Photo masks provide high precision but are labor intensive, limiting scalability. While language descriptions are easy to provide, they often lack the specificity needed to distinguish visually similar objects within a frame. Despite their simplicity, sketches capture rich, fine-grained details of target objects and can be rapidly created, even by non-experts, making them an attractive alternative for segmentation tasks. We introduce a new approach that utilizes sketches as efficient and informative references for video object segmentation. To evaluate sketch-guided segmentation, we introduce a new benchmark consisting of three datasets: Sketch-DAVIS16, Sketch-DAVIS17, and Sketch-YouTube-VOS. Building on a memory-based framework for semi-supervised video object segmentation, we explore effective strategies for integrating sketch-based references. To ensure robust spatiotemporal coherence, we introduce two key innovations: the Temporal Relation Module and Sketch-Anchored Contrastive Learning. These modules enhance the model's ability to maintain consistency both across time and across different object instances. Our method is evaluated on the Sketch-VOS benchmark, demonstrating superior performance with overall improvements of 1.9%, 3.3%, and 2.0% over state-of-the-art methods on the Sketch-YouTube-VOS, Sketch-DAVIS 2016, and Sketch-DAVIS 2017 validation sets, respectively. Additionally, on the YouTube-VOS validation set, our method outperforms the leading language-based VOS approach by 10.1%.

引用

页数：29

共 50 条

[1] Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
Miao, Jiaxu
Wang, Xiaohan
Wu, Yu
Li, Wei
Zhang, Xu
Wei, Yunchao
Yang, Yi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21001 - 21011
[2] Edgel Index for Large-Scale Sketch-based Image Search
Cao, Yang
Wang, Changhu
Zhang, Liqing
Zhang, Lei
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 761 - 768
[3] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Ding, Henghui
Liu, Chang
He, Shuting
Jiang, Xudong
Loy, Chen Change
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2694 - 2703
[4] A Large-Scale Benchmark for Food Image Segmentation
Wu, Xiongwei
Fu, Xin
Liu, Ying
Lim, Ee-Peng
Hoi, Steven C. H.
Sun, Qianru
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 506 - 515
[5] ClearPose: Large-scale Transparent Object Dataset and Benchmark
Chen, Xiaotong
Zhang, Huijie
Yu, Zeren
Opipari, Anthony
Jenkins, Odest Chadwicke
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 381 - 396
[6] IdeaPanel: A Large Scale Interactive Sketch-based Image Search System
Xiao, Changcheng
Wang, Changhu
Zhang, Liqing
Zhang, Lei
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 667 - 668
[7] Sketch-based evaluation of image segmentation methods
Gavilan, David
Takahashi, Hiroki
Saito, Suguru
Nakajima, Masayuki
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 156 - 164
[8] Sketch-Based Retrieval in Large-Scale Image Database via Position-Aware Silhouette Matching
Hu, Shijie
Zhang, Hongxin
Zhang, Sanyuan
Fang, Zishuo
Huang, Qi
E-LEARNING AND GAMES, 2016, 9654 : 243 - 256
[9] Sketch-Based Annotation and Visualization in Video Authoring
Ma, Cui-Xia
Liu, Yong-Jin
Wang, Hong-An
Teng, Dong-Xing
Dai, Guo-Zhong
IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (04) : 1153 - 1165
[10] Collaborative Camouflaged Object Detection: A Large-Scale Dataset and Benchmark
Zhang, Cong
Bi, Hongbo
Xiang, Tian-Zhu
Wu, Ranwan
Tong, Jinghui
Wang, Xiufang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 15

← 1 2 3 4 5 →