Integrating instance-level knowledge to see the unseen: A two-stream network for video object segmentation

被引：1

作者：

Lu, Hannan ^{[1
]}

Tian, Zhi ^{[1
]}

Wei, Pengxu ^{[1
]}

Ren, Haibing ^{[1
]}

Zuo, Wangmeng ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, 92 Xidazhi St, Harbin 150006, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 602卷

基金：

中国国家自然科学基金;

关键词：

Video object segmentation; Matching-based; Two-stream network; Pixel division; Instance stream;

D O I：

10.1016/j.neucom.2024.127878

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing matching-based video object segmentation (VOS) approaches carry inherent limitations in segmenting pixels that have never appeared in the previous frames ( i.e. , unseen pixels). In this paper, we introduce a T wo- S tream N etwork (TSN), which addresses this issue by distinguishing between seen and unseen pixels softly and processes them with two streams. Particularly, a pixel division module is devised to generate a routing map, distinguishing between seen and unseen pixels. Guided by the routing map, TSN integrates instance-level knowledge from an instance stream and pixel-level information from a pixel stream explicitly, generating the final segmentation result. The soft partitioning strategy allows for flexibility and adaptability in the fusion process. Additionally, the compact instance stream encodes and leverages instance-level knowledge, resulting in improved segmentation accuracy of the unseen pixels. Extensive experiments demonstrate the effectiveness of our proposed TSN, and we also report state-of-the-art performance on public VOS benchmarks.

引用

页数：12

共 66 条

[1] All about VLAD [J].

Arandjelovic, Relja ;

Zisserman, Andrew .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :1578-1585

[2] One-Shot Video Object Segmentation [J].

Caelles, S. ;

Maninis, K. -K. ;

Pont-Tuset, J. ;

Leal-Taixe, L. ;

Cremers, D. ;

Van Gool, L. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329

[3] Dynamic Convolution: Attention over Convolution Kernels [J].

Chen, Yinpeng ;

Dai, Xiyang ;

Liu, Mengchen ;

Chen, Dongdong ;

Yuan, Lu ;

Liu, Zicheng .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11027-11036

[4]

Cheng HK, 2021, ADV NEUR IN, V34

[5] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model [J].

Cheng, Ho Kei ;

Schwing, Alexander G. .

COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 :640-658

[6] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695

[7] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [J].

Duke, Brendan ;

Ahmed, Abdalla ;

Wolf, Christian ;

Aarabi, Parham ;

Taylor, Graham W. .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5908-5917

[8] Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence [J].

Fan, Jiaqing ;

Liu, Bo ;

Zhang, Kaihua ;

Liu, Qingshan .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) :8153-8164

[9] Video object segmentation based on multi-level target models and feature integration [J].

Gao, Bocong ;

Zhao, Yuqian ;

Zhang, Fan ;

Luo, Biao ;

Yang, Chunhua .

NEUROCOMPUTING, 2022, 492 :396-407

[10] Decoupling Multimodal Transformers for Referring Video Object Segmentation [J].

Gao, Mingqi ;

Yang, Jinyu ;

Han, Jungong ;

Lu, Ke ;

Zheng, Feng ;

Montana, Giovanni .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) :4518-4528

← 1 2 3 4 5 6 7 →