Tracking Anything with Decoupled Video Segmentation

被引：47

作者：

Cheng, Ho Kei ^{[1
]}

Oh, Seoung Wug ^{[2
]}

Price, Brian ^{[2
]}

Schwing, Alexander ^{[1
]}

Lee, Joon-Young ^{[2
]}

机构：

[1] Univ Illinois, Urbana, IL 61801 USA

[2] Adobe Res, San Francisco, CA USA

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.00127

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: hkchengrex.github.io/Tracking-Anything-with-DEVA.

引用

页码：1316 / 1326

页数：11

共 63 条

[1]

[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.394

[2]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00863

[3]

Athar Ali, 2023, ARXIV230102657

[4]

Athar Ali, 2023, WACV

[5] Tracking without bells and whistles [J].

Bergmann, Philipp ;

Meinhardt, Tim ;

Leal-Taixe, Laura .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :941-951

[6] Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation [J].

Bertasius, Gedas ;

Torresani, Lorenzo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9736-9745

[7]

Caelles S, 2019, ARXIV PREPRINT ARXIV

[8]

Cheng B., 2022, CVPR

[9]

Cheng Bowen, 2021, MASK2FORMER VIDEO IN

[10]

Cheng Haoyue, 2022, ECCV

← 1 2 3 4 5 6 7 →