Scribble-Supervised Video Object Segmentation

被引：79

作者：

Huang, Peiliang ^{[1
]}

Han, Junwei ^{[1
]}

Liu, Nian ^{[2
]}

Ren, Jun ^{[3
]}

Zhang, Dingwen ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Brain & Artificial Intelligence Lab, Xian 710072, Peoples R China

[2] Mohamed Bin Zayed Univ Artificial Intelligence, Dept Engagement Serv, Abu Dhabi, U Arab Emirates

[3] Sci & Technol Complex Syst Control & Intelligent, Beijing, Peoples R China

来源：

IEEE-CAA JOURNAL OF AUTOMATICA SINICA | 2022年 / 9卷 / 02期

基金：

美国国家科学基金会; 国家重点研发计划;

关键词：

Convolutional neural networks (CNNs); scribble; self-attention; video object segmentation; weakly supervised;

D O I：

10.1109/JAS.2021.1004210

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, video object segmentation has received great attention in the computer vision community. Most of the existing methods heavily rely on the pixel-wise human annotations, which are expensive and time-consuming to obtain. To tackle this problem, we make an early attempt to achieve video object segmentation with scribble-level supervision, which can alleviate large amounts of human labor for collecting the manual annotation. However, using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete. To address this issue, this paper introduces two novel elements to learn the video object segmentation model. The first one is the scribble attention module, which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background. The other one is the scribble-supervised loss, which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage. To evaluate the proposed method, we implement experiments on two video object segmentation benchmark datasets, YouTube-video object segmentation (VOS), and densely annotated video segmentation (DAVIS)-2017. We first generate the scribble annotations from the original per-pixel annotations. Then, we train our model and compare its test performance with the baseline models and other existing works. Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.

引用

页码：339 / 353

页数：15

共 80 条

[1] Fast approximate energy minimization via graph cuts [J].

Boykov, Y ;

Veksler, O ;

Zabih, R .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (11) :1222-1239

[2]

Brox T, 2010, LECT NOTES COMPUT SC, V6315, P282, DOI 10.1007/978-3-642-15555-0_21

[3] One-Shot Video Object Segmentation [J].

Caelles, S. ;

Maninis, K. -K. ;

Pont-Tuset, J. ;

Leal-Taixe, L. ;

Cremers, D. ;

Van Gool, L. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329

[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[5] Video Object Segmentation Via Dense Trajectories [J].

Chen, Lin ;

Shen, Jianbing ;

Wang, Wenguan ;

Ni, Bingbing .

IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (12) :2225-2234

[6] ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes [J].

Chen, Yuhua ;

Li, Wen ;

Van Gool, Luc .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7892-7901

[7] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].

Chen, Yuhua ;

Pont-Tuset, Jordi ;

Montes, Alberto ;

Van Gool, Luc .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198

[8] Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Hung, Wei-Chih ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424

[9] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695

[10]

Faktor A., 2014, BRIT MACH VIS C BMVC, V2, P8

← 1 2 3 4 5 6 7 8 →