Motion cues guided feature aggregation and enhancement for video object segmentation

被引:6
作者
Li, Xuejun [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, Key Lab Child Dev & Learning Sci Minist Educ, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object segmentation; Convolutional neural networks; Clustering; Optical flow; Feature fusion; CO-SEGMENTATION;
D O I
10.1016/j.neucom.2022.03.064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object segmentation (VOS) aims to separate unknown target objects from various given video sequences. Although many recent successful methods boosted the performance of VOS, especially those using deep convolution neural networks (CNNs), it is still difficult to aggregate deep features as well as motion cues effectively, which can be important to associate valid information of adjacent frames in video sequences. To tackle this problem, we propose a simple yet effective feature optimization method for VOS based on motion information. To achieve this, we construct a two-branch deep network and use computed motion cues (i.e., optical flow) to jointly optimize global and local interframe correlation information. Additionally, a clustering-based feature enhancement module is proposed to further fuse motion information and enhance the feature saliency of the target area. Optimized feature maps show a significant performance improvement in the final VOS tasks, especially those with rapid target movement. Experiments on the DAVIS16, DAVIS17, YouTube-Objects and YouTube-VOS datasets demonstrate that our simple feature aggregation and enhancement method for VOS improves segmentation accuracy effectively and gains an impressive result compared to many state-of-the-art methods. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:176 / 190
页数:15
相关论文
共 56 条
[1]   A Database and Evaluation Methodology for Optical Flow [J].
Baker, Simon ;
Scharstein, Daniel ;
Lewis, J. P. ;
Roth, Stefan ;
Black, Michael J. ;
Szeliski, Richard .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2011, 92 (01) :1-31
[2]   CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF [J].
Bao, Linchao ;
Wu, Baoyuan ;
Liu, Wei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5977-5986
[3]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[4]   Fast and Accurate Online Video Object Segmentation via Tracking Parts [J].
Cheng, Jingchun ;
Tsai, Yi-Hsuan ;
Hung, Wei-Chih ;
Wang, Shengjin ;
Yang, Ming-Hsuan .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7415-7424
[5]   Global Contrast based Salient Region Detection [J].
Cheng, Ming-Ming ;
Zhang, Guo-Xin ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Hu, Shi-Min .
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]   Video object segmentation based on motion-aware ROI prediction and adaptive reference updating [J].
Fu, Lihua ;
Zhao, Yu ;
Sun, Xiaowei ;
Huang, Jialiang ;
Wang, Dan ;
Ding, Yu .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167
[8]  
Galasso Fabio, 2012, AS C COMP VIS, P760
[9]  
Girshick R, 2017, IEEE INT C COMPUT VI, P2961, DOI [10.1109/iccv.201, DOI 10.1109/ICCV.2017.322]
[10]   Video Co-segmentation for Meaningful Action Extraction [J].
Guo, Jiaming ;
Li, Zhuwen ;
Cheong, Loong-Fah ;
Zhou, Steven Zhiying .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2232-2239