F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

被引：0

作者：

Liu, Daizong ^{[1
]}

Yu, Dongdong ^{[2
]}

Wang, Changhu ^{[2
]}

Zhou, Pan ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

[2] ByteDance AI Lab, Wuhan, Peoples R China

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although deep learning based methods have achieved great progress in unsupervised video object segmentation, difficult scenarios (e.g., visual similarity, occlusions, and appearance changing) are still not well-handled. To alleviate these issues, we propose a novel Focus on Foreground Network (F2Net), which delves into the intra-inter frame details for the foreground objects and thus effectively improve the segmentation performance. Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module. Firstly, we take a siamese encoder to extract the feature representations of paired frames (reference frame and current frame). Then, a Center Guiding Appearance Diffusion Module is designed to capture the inter-frame feature (dense correspondences between reference frame and current frame), intra-frame feature (dense correspondences in current frame), and original semantic feature of current frame. Specifically, we establish a Center Prediction Branch to predict the center location of the foreground object in current frame and leverage the center point information as spatial guidance prior to enhance the inter-frame and intra-frame feature extraction, and thus the feature representation considerably focus on the foreground objects. Finally, we propose a Dynamic Information Fusion Module to automatically select relatively important features through three aforementioned different level features. Extensive experiments on DAVIS2016, Youtube-object, and FBMS datasets show that our proposed F2Net achieves the state-of-the-art performance with significant improvement.

引用

页码：2109 / 2117

页数：9

共 41 条

[1]

[Anonymous], pervised Semantic Segmentation via Adversarial Learning

[2]

[Anonymous], 2018, ECCV, DOI DOI 10.1145/3241539.3241552

[3]

[Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00683

[4]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[5] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].

Chen, Yuhua ;

Pont-Tuset, Jordi ;

Montes, Alberto ;

Van Gool, Luc .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198

[6] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695

[7] Global Contrast based Salient Region Detection [J].

Cheng, Ming-Ming ;

Zhang, Guo-Xin ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Hu, Shi-Min .

2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416

[8] Multi-Context Attention for Human Pose Estimation [J].

Chu, Xiao ;

Yang, Wei ;

Ouyang, Wanli ;

Ma, Cheng ;

Yuille, Alan L. ;

Wang, Xiaogang .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678

[9]

Faisal Muhammad, 2019, ARXIV190913258

[10]

Faktor Alon, 2014, BRIT MACH VIS C BMVC

← 1 2 3 4 5 →