AGUnet: Annotation-guided U-net for fast one-shot video object segmentation

被引：18

作者：

Yin, Yingjie ^{[1
,2
,3
]}

Xu, De ^{[1
,3
]}

Wang, Xingang ^{[1
,3
]}

Zhang, Lei ^{[2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Res Ctr Precis Sensing & Control, Beijing 100190, Peoples R China

[2] Hong Kong Polytech Univ, Dept Comp, Hung Hom, Kowloon, Hong Kong, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

PATTERN RECOGNITION | 2021年 / 110卷

基金：

中国国家自然科学基金;

关键词：

Fully-convolutional Siamese network; U-net; Interactive image segmentation; Video object segmentation;

D O I：

10.1016/j.patcog.2020.107580

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The problem of semi-supervised video object segmentation has been popularly tackled by fine-tuning a general-purpose segmentation deep network on the annotated frame using hundreds of iterations of gra-dient descent. The time-consuming fine-tuning process, however, makes these methods difficult to use in practical applications. We propose a novel architecture called Annotation Guided U-net (AGUnet) for fast one-shot video object segmentation (VOS). AGUnet can quickly adapt a model trained on static images to segmenting the given target in a video by only several iterations of gradient descent. Our AGUnet is inspired by interactive image segmentation, where the interested target is segmented by using user annotated foreground. However, in AGUnet we use a fully-convolutional Siamese network to automatically annotate the foreground and background regions and fuse such annotation information into the skip connection of a U-net for VOS. Our AGUnet can be trained end-to-end effectively on static images instead of video sequences as required by many previous methods. The experiments show that AGUnet runs much faster than current state-of-the-art one-shot VOS algorithms while achieving competitive accuracy, and it has high generalization capability. (c) 2020 Elsevier Ltd. All rights reserved.

引用

页数：10

共 40 条

[11] Global Contrast Based Salient Region Detection [J].

Cheng, Ming-Ming ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Torr, Philip H. S. ;

Hu, Shi-Min .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (03) :569-582

[12] FlowNet: Learning Optical Flow with Convolutional Networks [J].

Dosovitskiy, Alexey ;

Fischer, Philipp ;

Ilg, Eddy ;

Haeusser, Philip ;

Hazirbas, Caner ;

Golkov, Vladimir ;

van der Smagt, Patrick ;

Cremers, Daniel ;

Brox, Thomas .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766

[13]

Everingham M., 2010, INT J COMPUT VISION, V88, P303, DOI [DOI 10.1007/s11263-009-0275-4, 10.1007/s11263-009-0275-4]

[14]

Fan Q., 2015, ACM T GRAPH TOG, V34, P1

[15]

Hu Y.T., 2018, ECCV, P54

[16] Lucid Data Dreaming for Video Object Segmentation [J].

Khoreva, Anna ;

Benenson, Rodrigo ;

Ilg, Eddy ;

Brox, Thomas ;

Schiele, Bernt .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (09) :1175-1197

[17] Video Segmentation by Tracking Many Figure-Ground Segments [J].

Li, Fuxin ;

Kim, Taeyoung ;

Humayun, Ahmad ;

Tsai, David ;

Rehg, James M. .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2192-2199

[18] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[19] Siamese Convolutional Neural Networks for Remote Sensing Scene Classification [J].

Liu, Xuning ;

Zhou, Yong ;

Zhao, Jiaqi ;

Yao, Rui ;

Liu, Bing ;

Zheng, Yi .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (08) :1200-1204

[20]

Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965

← 1 2 3 4 →