Integrating instance-level knowledge to see the unseen: A two-stream network for video object segmentation

被引:0
作者
Lu, Hannan [1 ]
Tian, Zhi [1 ]
Wei, Pengxu [1 ]
Ren, Haibing [1 ]
Zuo, Wangmeng [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, 92 Xidazhi St, Harbin 150006, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object segmentation; Matching-based; Two-stream network; Pixel division; Instance stream;
D O I
10.1016/j.neucom.2024.127878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing matching-based video object segmentation (VOS) approaches carry inherent limitations in segmenting pixels that have never appeared in the previous frames ( i.e. , unseen pixels). In this paper, we introduce a T wo- S tream N etwork (TSN), which addresses this issue by distinguishing between seen and unseen pixels softly and processes them with two streams. Particularly, a pixel division module is devised to generate a routing map, distinguishing between seen and unseen pixels. Guided by the routing map, TSN integrates instance-level knowledge from an instance stream and pixel-level information from a pixel stream explicitly, generating the final segmentation result. The soft partitioning strategy allows for flexibility and adaptability in the fusion process. Additionally, the compact instance stream encodes and leverages instance-level knowledge, resulting in improved segmentation accuracy of the unseen pixels. Extensive experiments demonstrate the effectiveness of our proposed TSN, and we also report state-of-the-art performance on public VOS benchmarks.
引用
收藏
页数:12
相关论文
共 66 条
[1]   All about VLAD [J].
Arandjelovic, Relja ;
Zisserman, Andrew .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :1578-1585
[2]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[3]   Dynamic Convolution: Attention over Convolution Kernels [J].
Chen, Yinpeng ;
Dai, Xiyang ;
Liu, Mengchen ;
Chen, Dongdong ;
Yuan, Lu ;
Liu, Zicheng .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11027-11036
[4]  
CHENG HK, 2021, ADV NEUR IN, V34, pNI249
[5]   XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model [J].
Cheng, Ho Kei ;
Schwing, Alexander G. .
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 :640-658
[6]   SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695
[7]   SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [J].
Duke, Brendan ;
Ahmed, Abdalla ;
Wolf, Christian ;
Aarabi, Parham ;
Taylor, Graham W. .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5908-5917
[8]   Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence [J].
Fan, Jiaqing ;
Liu, Bo ;
Zhang, Kaihua ;
Liu, Qingshan .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) :8153-8164
[9]   Video object segmentation based on multi-level target models and feature integration [J].
Gao, Bocong ;
Zhao, Yuqian ;
Zhang, Fan ;
Luo, Biao ;
Yang, Chunhua .
NEUROCOMPUTING, 2022, 492 :396-407
[10]   Decoupling Multimodal Transformers for Referring Video Object Segmentation [J].
Gao, Mingqi ;
Yang, Jinyu ;
Han, Jungong ;
Lu, Ke ;
Zheng, Feng ;
Montana, Giovanni .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) :4518-4528