Full-Duplex Strategy for Video Object Segmentation

被引：111

作者：

Ji, Ge-Peng ^{[1
,2
]}

Fu, Keren ^{[3
]}

Wu, Zhe ^{[4
]}

Fan, Deng-Ping ^{[1
]}

Shen, Jianbing ^{[5
]}

Shao, Ling ^{[1
]}

机构：

[1] IIAI, Hong Kong, Peoples R China

[2] Wuhan Univ, Sch CS, Wuhan, Peoples R China

[3] Sichuan Univ, Coll CS, Chengdu, Peoples R China

[4] Peng Cheng Lab, Chengdu, Peoples R China

[5] Univ Macau, Dept CIS, Zhuhai, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国博士后科学基金;

关键词：

SALIENCY DETECTION; OPTIMIZATION;

D O I：

10.1109/ICCV48922.2021.00488

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Appearance and motion are two important sources of information in video object segmentation (VOS). Previous methods mainly focus on using simplex solutions, lowering the upper bound of feature collaboration among and across these two cues. In this paper, we study a novel framework, termed the FSNet (Full-duplex Strategy Network), which designs a relational cross-attention module (RCAM) to achieve the bidirectional message propagation across embedding subspaces. Furthermore, the bidirectional purification module (BPM) is introduced to update the inconsistent features between the spatial-temporal embeddings, effectively improving the model robustness. By considering the mutual restraint within the full-duplex strategy, our FSNet performs the cross-modal feature-passing (i.e., transmission and receiving) simultaneously before the fusion and decoding stage, making it robust to various challenging scenarios (e.g., motion blur, occlusion) in VOS. Extensive experiments on five popular benchmarks (i.e., DAVIS16, FBMS, MCL, SegTrack-V2, and DAVSOD19) show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.

引用

页码：4902 / 4913

页数：12

共 129 条

[101]

Wang Wenguan, 2017, IEEE TPAMI, V40, P20

[102]

Wang Wenguan, 2018, IEEE TPAMI, V41, P985

[103]

Wang Wenhai, 2021, IEEE ICCV

[104] Spreading mechanism of Weibo public opinion phonetic representation based on the epidemic model [J].

Wang, Yuanyuan ;

Huang, Xinliang ;

Li, Bingqing ;

Liu, Xiaoqing ;

Ma, Yingying ;

Huang, Xinjing .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 26 (1) :11-21

[105] Ranking Video Salient Object Detection [J].

Wang, Zheng ;

Yan, Xinyu ;

Han, Yahong ;

Sun, Meijun .

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :873-881

[106]

Wenguan Wang, 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P3395, DOI 10.1109/CVPR.2015.7298961

[107]

WenguanWang Xiankai Lu, 2019, IEEE ICCV

[108] GUIDED SEARCH - AN ALTERNATIVE TO THE FEATURE INTEGRATION MODEL FOR VISUAL-SEARCH [J].

WOLFE, JM ;

CAVE, KR ;

FRANZEL, SL .

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1989, 15 (03) :419-433

[109] Stacked Cross Refinement Network for Edge-Aware Salient Object Detection [J].

Wu, Zhe ;

Su, Li ;

Huang, Qingming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7263-7272

[110] Online Meta Adaptation for Fast Video Object Segmentation [J].

Xiao, Huaxin ;

Kang, Bingyi ;

Liu, Yu ;

Zhang, Maojun ;

Feng, Jiashi .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) :1205-1217

← 4 5 6 7 8 9 10 11 12 13 →