Full-Duplex Strategy for Video Object Segmentation

被引:111
作者
Ji, Ge-Peng [1 ,2 ]
Fu, Keren [3 ]
Wu, Zhe [4 ]
Fan, Deng-Ping [1 ]
Shen, Jianbing [5 ]
Shao, Ling [1 ]
机构
[1] IIAI, Hong Kong, Peoples R China
[2] Wuhan Univ, Sch CS, Wuhan, Peoples R China
[3] Sichuan Univ, Coll CS, Chengdu, Peoples R China
[4] Peng Cheng Lab, Chengdu, Peoples R China
[5] Univ Macau, Dept CIS, Zhuhai, Peoples R China
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
中国博士后科学基金;
关键词
SALIENCY DETECTION; OPTIMIZATION;
D O I
10.1109/ICCV48922.2021.00488
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Appearance and motion are two important sources of information in video object segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering the upper bound of feature collaboration among and across these two cues. In this paper, we study a novel framework, termed the FSNet (Full-duplex Strategy Network), which designs a relational cross-attention module (RCAM) to achieve the bidirectional message propagation across embedding subspaces. Furthermore, the bidirectional purification module (BPM) is introduced to update the inconsistent features between the spatial-temporal embeddings, effectively improving the model robustness. By considering the mutual restraint within the full-duplex strategy, our FSNet performs the cross-modal feature-passing (i.e., transmission and receiving) simultaneously before the fusion and decoding stage, making it robust to various challenging scenarios (e.g., motion blur, occlusion) in VOS. Extensive experiments on five popular benchmarks (i.e., DAVIS16, FBMS, MCL, SegTrack-V2, and DAVSOD19) show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
引用
收藏
页码:4902 / 4913
页数:12
相关论文
共 129 条
[41]   Progressively Normalized Self-Attention Network for Video Polyp Segmentation [J].
Ji, Ge-Peng ;
Chou, Yu-Cheng ;
Fan, Deng-Ping ;
Chen, Geng ;
Fu, Huazhu ;
Jha, Debesh ;
Shao, Ling .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 :142-152
[42]  
Johnander Joakim, 2019, IEEE CVPR, P8953
[43]  
Khoreva Anna, 2017, IEEE CVPRW
[44]   Spatiotemporal Saliency Detection for Video Sequences Based on Random Walk With Restart [J].
Kim, Hansang ;
Kim, Youngbae ;
Sim, Jae-Young ;
Kim, Chang-Su .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (08) :2552-2564
[45]  
KOCH C, 1985, HUM NEUROBIOL, V4, P219
[46]   Primary Object Segmentation in Videos Based on Region Augmentation and Reduction [J].
Koh, Yeong Jun ;
Kim, Chang-Su .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7417-7425
[47]  
Krahenbuhl P., 2011, Advances in Neural Information Processing Systems, V24, P109, DOI DOI 10.48550/ARXIV.1210.5644
[48]  
Lan M, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P701
[49]  
Lao Dong, 2018, ECCV, P435
[50]  
Le Trung-Nghia, 2017, BMVC