A temporal attention based appearance model for video object segmentation

被引:3
作者
Wang, Hui [1 ]
Liu, Weibin [1 ]
Xing, Weiwei [2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Video object segmentation; Convolutional neural networks; Appearance model; Mixture loss;
D O I
10.1007/s10489-021-02547-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
More and more researchers have recently paid attention to video object segmentation because it is an important building block for numerous computer vision applications. Although many algorithms promote its development, there are still some open challenges. Efficient and robust pipelines are needed to address appearance changes and the distraction from similar background objects in the video object segmentation. This paper proposes a novel neural network that integrates a temporal attention based appearance model and a boundary-aware loss. The appearance model fuses the appearance information of the first frame, the previous frame, and the current frame in the feature space, which assists the proposed method to learn a discriminative and robust target representation and avoid the drift problem of traditional propagation schemes. Moreover, the boundary-aware loss is employed for network training. Equipped with the boundary-aware loss, the proposed method achieves more accurate segmentation results with clear boundaries. The proposed method is compared with several recent state-of-the-art algorithms on popular benchmark datasets. Comprehensive experiments show that the proposed method achieves favorable performance with a high frame rate.
引用
收藏
页码:2290 / 2300
页数:11
相关论文
共 50 条
[1]  
[Anonymous], 2016, CoRR abs/1604.04339
[2]   CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF [J].
Bao, Linchao ;
Wu, Baoyuan ;
Liu, Wei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5977-5986
[3]   Similarity analysis of video sequences using an artificial neural network [J].
Bhandarkar, SM ;
Chen, F .
APPLIED INTELLIGENCE, 2005, 22 (03) :251-275
[4]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[5]   Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning [J].
Chen, Yuhua ;
Pont-Tuset, Jordi ;
Montes, Alberto ;
Van Gool, Luc .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1189-1198
[6]   Global Contrast based Salient Region Detection [J].
Cheng, Ming-Ming ;
Zhang, Guo-Xin ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Hu, Shi-Min .
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[9]   Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground [J].
Fan, Deng-Ping ;
Cheng, Ming-Ming ;
Liu, Jiang-Jiang ;
Gao, Shang-Hua ;
Hou, Qibin ;
Borji, Ali .
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 :196-212
[10]  
He K., 2016, P IEEE C COMPUTER VI, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]