AGUnet: Annotation-guided U-net for fast one-shot video object segmentation

被引:18
作者
Yin, Yingjie [1 ,2 ,3 ]
Xu, De [1 ,3 ]
Wang, Xingang [1 ,3 ]
Zhang, Lei [2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Res Ctr Precis Sensing & Control, Beijing 100190, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hung Hom, Kowloon, Hong Kong, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Fully-convolutional Siamese network; U-net; Interactive image segmentation; Video object segmentation;
D O I
10.1016/j.patcog.2020.107580
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of semi-supervised video object segmentation has been popularly tackled by fine-tuning a general-purpose segmentation deep network on the annotated frame using hundreds of iterations of gra-dient descent. The time-consuming fine-tuning process, however, makes these methods difficult to use in practical applications. We propose a novel architecture called Annotation Guided U-net (AGUnet) for fast one-shot video object segmentation (VOS). AGUnet can quickly adapt a model trained on static images to segmenting the given target in a video by only several iterations of gradient descent. Our AGUnet is inspired by interactive image segmentation, where the interested target is segmented by using user annotated foreground. However, in AGUnet we use a fully-convolutional Siamese network to automatically annotate the foreground and background regions and fuse such annotation information into the skip connection of a U-net for VOS. Our AGUnet can be trained end-to-end effectively on static images instead of video sequences as required by many previous methods. The experiments show that AGUnet runs much faster than current state-of-the-art one-shot VOS algorithms while achieving competitive accuracy, and it has high generalization capability. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 40 条
[31]   Video Segmentation via Object Flow [J].
Tsai, Yi-Hsuan ;
Yang, Ming-Hsuan ;
Black, Michael J. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3899-3908
[32]  
Voigtlaender P., 2017, PRACTICAL GUIDE
[33]   Fast Online Object Tracking and Segmentation: A Unifying Approach [J].
Wang, Qiang ;
Zhang, Li ;
Bertinetto, Luca ;
Hu, Weiming ;
Torr, Philip H. S. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1328-1338
[34]   Diffusive likelihood for interactive image segmentation [J].
Wang, Tao ;
Ji, Zexuan ;
Sun, Quansen ;
Chen, Qiang ;
Ge, Qi ;
Yang, Jian .
PATTERN RECOGNITION, 2018, 79 :440-451
[35]   Efficient Video Object Segmentation via Network Modulation [J].
Yang, Linjie ;
Wang, Yanran ;
Xiong, Xuehan ;
Yang, Jianchao ;
Katsaggelos, Aggelos K. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6499-6507
[36]   Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks [J].
Yoon, Jae Shin ;
Rameau, Francois ;
Kim, Junsik ;
Lee, Seokju ;
Shin, Seunghak ;
Kweon, In So .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2186-2195
[37]   DELTA: A deep dual-stream network for multi-label image classification [J].
Yu, Wan-Jin ;
Chen, Zhen-Duo ;
Luo, Xin ;
Liu, Wu ;
Xu, Xin-Shun .
PATTERN RECOGNITION, 2019, 91 :322-331
[38]   Similarity learning with joint transfer constraints for person re-identification [J].
Zhao, Cairong ;
Wang, Xuekuan ;
Zuo, Wangmeng ;
Shen, Fumin ;
Shao, Ling ;
Miao, Duoqian .
PATTERN RECOGNITION, 2020, 97
[39]   Person re-identification via integrating patch-based metric learning and local salience learning [J].
Zhao, Zhicheng ;
Zhao, Binlin ;
Su, Fei .
PATTERN RECOGNITION, 2018, 75 :90-98
[40]   Distractor-Aware Siamese Networks for Visual Object Tracking [J].
Zhu, Zheng ;
Wang, Qiang ;
Li, Bo ;
Wu, Wei ;
Yan, Junjie ;
Hu, Weiming .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :103-119