AGUnet: Annotation-guided U-net for fast one-shot video object segmentation

被引：18

作者：

Yin, Yingjie ^{[1
,2
,3
]}

Xu, De ^{[1
,3
]}

Wang, Xingang ^{[1
,3
]}

Zhang, Lei ^{[2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Res Ctr Precis Sensing & Control, Beijing 100190, Peoples R China

[2] Hong Kong Polytech Univ, Dept Comp, Hung Hom, Kowloon, Hong Kong, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

PATTERN RECOGNITION | 2021年 / 110卷

基金：

中国国家自然科学基金;

关键词：

Fully-convolutional Siamese network; U-net; Interactive image segmentation; Video object segmentation;

D O I：

10.1016/j.patcog.2020.107580

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The problem of semi-supervised video object segmentation has been popularly tackled by fine-tuning a general-purpose segmentation deep network on the annotated frame using hundreds of iterations of gra-dient descent. The time-consuming fine-tuning process, however, makes these methods difficult to use in practical applications. We propose a novel architecture called Annotation Guided U-net (AGUnet) for fast one-shot video object segmentation (VOS). AGUnet can quickly adapt a model trained on static images to segmenting the given target in a video by only several iterations of gradient descent. Our AGUnet is inspired by interactive image segmentation, where the interested target is segmented by using user annotated foreground. However, in AGUnet we use a fully-convolutional Siamese network to automatically annotate the foreground and background regions and fuse such annotation information into the skip connection of a U-net for VOS. Our AGUnet can be trained end-to-end effectively on static images instead of video sequences as required by many previous methods. The experiments show that AGUnet runs much faster than current state-of-the-art one-shot VOS algorithms while achieving competitive accuracy, and it has high generalization capability. (c) 2020 Elsevier Ltd. All rights reserved.

引用

页数：10

共 40 条

[1]

[Anonymous], 2017, CVPR

[2]

[Anonymous], 2018, The 2017 davis challenge on video object segmentation

[3]

[Anonymous], 2017, P IEEE C COMP VIS PA

[4]

[Anonymous], 2017, IEEE CVPR

[5] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[6] Albumentations: Fast and Flexible Image Augmentations [J].

Buslaev, Alexander ;

Iglovikov, Vladimir I. ;

Khvedchenya, Eugene ;

Parinov, Alex ;

Druzhinin, Mikhail ;

Kalinin, Alexandr A. .

INFORMATION, 2020, 11 (02)

[7] A Video Representation Using Temporal Superpixels [J].

Chang, Jason ;

Wei, Donglai ;

Fisher, John W., III .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2051-2058

[8] Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].

Chen, Yuhua ;

Pont-Tuset, Jordi ;

Montes, Alberto ;

Van Gool, Luc .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198

[9] Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Hung, Wei-Chih ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424

[10] SegFlow: Joint Learning for Video Object Segmentation and Optical Flow [J].

Cheng, Jingchun ;

Tsai, Yi-Hsuan ;

Wang, Shengjin ;

Yang, Ming-Hsuan .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :686-695

← 1 2 3 4 →