Coherence-aware context aggregator for fast video object segmentation

被引:23
作者
Lan, Meng [1 ]
Zhang, Jing [2 ]
Wang, Zengmao [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Univ Sydney, Sch Comp Sci, Camperdown, Australia
基金
中国国家自然科学基金;
关键词
Video object segmentation; Semi-supervised learning; Spatio-temporal representation; Context;
D O I
10.1016/j.patcog.2022.109214
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised video object segmentation (VOS) is a highly challenging problem that has attracted much research attention in recent years. Temporal context plays an important role in VOS by providing object clues from the past frames. However, most of the prevailing methods directly use the predicted temporal results to guide the segmentation of the current frame, while ignoring the coherence of tem-poral context, which may be misleading and degrade the performance. In this paper, we propose a novel model named Coherence-aware Context Aggregator (CCA) for VOS, which consists of three modules. First, a coherence-aware module (CAM) is proposed to evaluate the coherence of the predicted result of the current frame and then fuses the coherent features to update the temporal context. CAM can determine whether the prediction is accurate, thus guiding the update of the temporal context and avoiding the introduction of erroneous information. Second, we devise a spatio-temporal context aggregation (STCA) module to aggregate the temporal context with the spatial feature of the current frame to learn a robust and discriminative target representation in the decoder part. Third, we design a refinement module to refine the coarse feature generated from the STCA module for more precise segmentation. Additionally, CCA uses a cropping strategy and takes small-size images as input, thus making it computationally ef-ficient and achieving a real-time running speed. Extensive experiments on four challenging benchmarks show that CCA achieves a better trade-off between efficiency and accuracy compared to state-of-the-art methods. The code will be public. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
[41]   OCVOS: OBJECT-CENTRIC REPRESENTATION FOR VIDEO OBJECT SEGMENTATION [J].
Jo, Junho ;
Wee, Dongyoon ;
Cho, Nam Ik .
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, :1655-1659
[42]   Video Object Segmentation without Temporal Information [J].
Maninis, Kevis-Kokitsi ;
Caelles, Sergi ;
Chen, Yuhua ;
Pont-Tuset, Jordi ;
Leal-Taixe, Laura ;
Cremers, Daniel ;
Van Gool, Luc .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (06) :1515-1530
[43]   Video Object Segmentation by Latent Outcome Regression [J].
Zhang, Lin ;
Lu, Yao .
IEEE ACCESS, 2020, 8 :30355-30367
[44]   Asymmetric Label Propagation for Video Object Segmentation [J].
Chen, Zhen ;
Yang, Ming ;
Zhang, Shiliang .
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
[45]   Adaptive Online Learning for Video Object Segmentation [J].
Wei, Li ;
Xu, Chunyan ;
Zhang, Tong .
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 :22-34
[46]   Tackling Background Distraction in Video Object Segmentation [J].
Cho, Suhwan ;
Lee, Heansung ;
Lee, Minhyeok ;
Park, Chaewon ;
Jang, Sungjun ;
Kim, Minjung ;
Lee, Sangyoun .
COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 :446-462
[47]   Video Object Segmentation Via Dense Trajectories [J].
Chen, Lin ;
Shen, Jianbing ;
Wang, Wenguan ;
Ni, Bingbing .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (12) :2225-2234
[48]   Lucid Data Dreaming for Video Object Segmentation [J].
Anna Khoreva ;
Rodrigo Benenson ;
Eddy Ilg ;
Thomas Brox ;
Bernt Schiele .
International Journal of Computer Vision, 2019, 127 :1175-1197
[49]   Scribble-Supervised Video Object Segmentation [J].
Huang, Peiliang ;
Han, Junwei ;
Liu, Nian ;
Ren, Jun ;
Zhang, Dingwen .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (02) :339-353
[50]   Sequential Clique Optimization for Video Object Segmentation [J].
Koh, Yeong Jun ;
Lee, Young-Yoon ;
Kim, Chang-Su .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :537-556